public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Problem with RDMA device removal architecture
@ 2010-03-26 15:57 Steve Wise
       [not found] ` <4BACD985.1070906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 15:57 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, Sean Hefty

Hey Roland and RDMA experts,

I'd like to raise an issue with the the architecture of the Linux RDMA 
subsystem regarding device removal and RDMA provider deregistration:

IBM/PPC and probably other vendors/platforms have virtual or logical 
partitions running Linux and they want to be able to add or remove 
devices, including rdma devices, in a hot-plug fashion.  They also want 
to be able to "reset" a failed device (EEH events). For other networking 
devices, this works fine.  With RDMA devices, however, it is possible 
for user mode RDMA applications to totally hang the device removal 
process by virtue of the fact that they don't release all their uverb 
contexts and rdma cm ids.  If an application, for example,  allocates 
and binds an rdma cm id, then just goes to sleep forever, that will hang 
the removal of the underlying device.   Here is the path I'm talking about:

0) an evil application has an rdma cm id bound to rdma device A.  The 
application is just sleeping doing nothing else.

1) device A event happens causing the device to unregister itself with 
the RDMA core.  This could be an EEH event requiring full device reset, 
or a OS hot-plug removal event.

2) device A calls ib_unregister_device().  This results in calls to all 
RDMA kernel clients' remove() function.

3) rdma_cm:cma_remove_one() and friends end up posting 
RDMA_CM_EVENT_DEVICE_REMOVAL events to all kernel users.

4) rdma_ucm gets this event and dutifully posts it for the use app to 
reap.   But since the app doesn't reap this event and exit or at least 
destroy the cm id, nothing else happens.

5) rdma_cm blocks awaiting all references on the device to go away.  
Since there is an allocated cm id, it will block forever.

Similar logic exists in uverbs as well, I think, but with a uverbs 
context as the object that must be released by the application. 

I propose that this is actually a denial of service type issue and we 
should consider ways to fix it.   I believe we've had this discussion 
before but punted on it.  However, I think this is pretty important for 
some OS/platform environments, and I'd like to discuss it again with the 
goal to fix the code so this issue never happens.

Thoughts? 




Thanks,

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found] ` <4BACD985.1070906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 16:16   ` Sean Hefty
       [not found]     ` <DAF23AFA2B904B32B418D9C8798D4276-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2010-03-26 16:47   ` Tung, Chien Tin
  1 sibling, 1 reply; 25+ messages in thread
From: Sean Hefty @ 2010-03-26 16:16 UTC (permalink / raw)
  To: 'Steve Wise', Roland Dreier; +Cc: linux-rdma

>4) rdma_ucm gets this event and dutifully posts it for the use app to
>reap.   But since the app doesn't reap this event and exit or at least
>destroy the cm id, nothing else happens.

For the rdma_ucm, it should post the event, but destroy the underlying
rdma_cm_id (possibly by returning non-zero from the remove callback or from
another thread).  The only call that the rdma_ucm will succeed from user space
at that point is destroy.  State checking and synchronization would need to be
used to mark that the kernel id has already been freed.

We just need to ensure that the rdma_ucm doesn't try to destroy an id that is in
another downcall, and I think the synchronization will be non-trivial.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]     ` <DAF23AFA2B904B32B418D9C8798D4276-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2010-03-26 16:36       ` Steve Wise
       [not found]         ` <4BACE28A.2080409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 16:36 UTC (permalink / raw)
  To: Sean Hefty; +Cc: Roland Dreier, linux-rdma

Sean Hefty wrote:
>> 4) rdma_ucm gets this event and dutifully posts it for the use app to
>> reap.   But since the app doesn't reap this event and exit or at least
>> destroy the cm id, nothing else happens.
>>     
>
> For the rdma_ucm, it should post the event, but destroy the underlying
> rdma_cm_id (possibly by returning non-zero from the remove callback or from
> another thread).  The only call that the rdma_ucm will succeed from user space
> at that point is destroy.  State checking and synchronization would need to be
> used to mark that the kernel id has already been freed.
>
> We just need to ensure that the rdma_ucm doesn't try to destroy an id that is in
> another downcall, and I think the synchronization will be non-trivial.
>
>   

In addition I think there is an assumption in the rdma_ucm that the 
underlying rdma_cm_id exists whenever the ucma context is still valid.  
We might need some state in the ucma context that sez "no rdma_cm_id 
exists".    Then all the ucma code will have to check this before 
utilizing the rdma_cm_id.  Maybe just checking the ctx->cm_id pointer is 
sufficient.  

In other words, I think we want the ucma context to stay around until 
the application destroys it (via explicit means or via exit).   But the 
rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.

Steve.




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found] ` <4BACD985.1070906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2010-03-26 16:16   ` Sean Hefty
@ 2010-03-26 16:47   ` Tung, Chien Tin
       [not found]     ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD841-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Tung, Chien Tin @ 2010-03-26 16:47 UTC (permalink / raw)
  To: Steve Wise, Roland Dreier; +Cc: linux-rdma, Hefty, Sean

[...]

>  They also want
>to be able to "reset" a failed device (EEH events

[...]

>1) device A event happens causing the device to unregister itself with
>the RDMA core.  This could be an EEH event requiring full device reset,
>or a OS hot-plug removal event.

Just to nit-pick on terminology.  When you say reset a device do you mean
adapter(or silicon) reset?  In the case of multiport adapter, does that
mean all devices for that adapter will post RDMA_CM_EVENT_DEVICE_REMOVAL
event?

Chien

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found]         ` <4BACE28A.2080409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 16:50           ` Tung, Chien Tin
       [not found]             ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD851-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2010-03-26 17:08           ` Roland Dreier
  2010-03-26 17:08           ` Sean Hefty
  2 siblings, 1 reply; 25+ messages in thread
From: Tung, Chien Tin @ 2010-03-26 16:50 UTC (permalink / raw)
  To: Steve Wise, Hefty, Sean; +Cc: Roland Dreier, linux-rdma

>In other words, I think we want the ucma context to stay around until
>the application destroys it (via explicit means or via exit).   But the
>rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.

How do we "take care" of evil applications that won't go away?

Chien
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]     ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD841-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-03-26 16:53       ` Steve Wise
       [not found]         ` <4BACE695.4010006-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 16:53 UTC (permalink / raw)
  To: Tung, Chien Tin; +Cc: Roland Dreier, linux-rdma, Hefty, Sean

Tung, Chien Tin wrote:
> [...]
>
>   
>>  They also want
>> to be able to "reset" a failed device (EEH events
>>     
>
> [...]
>
>   
>> 1) device A event happens causing the device to unregister itself with
>> the RDMA core.  This could be an EEH event requiring full device reset,
>> or a OS hot-plug removal event.
>>     
>
> Just to nit-pick on terminology.  When you say reset a device do you mean
> adapter(or silicon) reset?  In the case of multiport adapter, does that
> mean all devices for that adapter will post RDMA_CM_EVENT_DEVICE_REMOVAL
> event?
>
>   

I mean whatever you call a device and register it with the RDMA core via 
ib_register_device().


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]             ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD851-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-03-26 16:59               ` Steve Wise
       [not found]                 ` <4BACE7E8.3040803-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 16:59 UTC (permalink / raw)
  To: Tung, Chien Tin; +Cc: Hefty, Sean, Roland Dreier, linux-rdma

Tung, Chien Tin wrote:
>> In other words, I think we want the ucma context to stay around until
>> the application destroys it (via explicit means or via exit).   But the
>> rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.
>>     
>
> How do we "take care" of evil applications that won't go away?
>
>   

Since the low level rdma_cm_id _is_ destroyed, then the RDMA device can 
unload and go away.  The evil app then only is wasting the ucma 
contexts, file descriptors, etc.

Now, we could also consider something more abrupt like delivering a 
SIGABRT or SIGBUS to all processes that have objects allocated for the 
device that is going away.    But that is kind of drastic, and if done 
unconditionally, will kill applications that want to process the device 
removal and free the objects using that device, but still want to 
continue running on the available devices.   So I wouldn't recommend we 
deliver fatal signals...

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found]         ` <4BACE695.4010006-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 17:02           ` Tung, Chien Tin
  0 siblings, 0 replies; 25+ messages in thread
From: Tung, Chien Tin @ 2010-03-26 17:02 UTC (permalink / raw)
  To: Steve Wise; +Cc: Roland Dreier, linux-rdma, Hefty, Sean

>>>  They also want
>>> to be able to "reset" a failed device (EEH events
>>>
>>
>> [...]
>>
>>
>>> 1) device A event happens causing the device to unregister itself with
>>> the RDMA core.  This could be an EEH event requiring full device reset,
>>> or a OS hot-plug removal event.
>>>
>>
>> Just to nit-pick on terminology.  When you say reset a device do you mean
>> adapter(or silicon) reset?  In the case of multiport adapter, does that
>> mean all devices for that adapter will post RDMA_CM_EVENT_DEVICE_REMOVAL
>> event?
>>
>>
>
>I mean whatever you call a device and register it with the RDMA core via
>ib_register_device().

thank you, now I feel better.  :-)

Chien
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]         ` <4BACE28A.2080409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2010-03-26 16:50           ` Tung, Chien Tin
@ 2010-03-26 17:08           ` Roland Dreier
       [not found]             ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  2010-03-26 17:08           ` Sean Hefty
  2 siblings, 1 reply; 25+ messages in thread
From: Roland Dreier @ 2010-03-26 17:08 UTC (permalink / raw)
  To: Steve Wise; +Cc: Sean Hefty, linux-rdma

 > In other words, I think we want the ucma context to stay around until
 > the application destroys it (via explicit means or via exit).   But
 > the rdma_cm_id gets destroyed immediately upon receiving a
 > DEVICE_REMOVE event.

Yes.  The RDMA CM is somewhat easier, but I think the basic idea should
be that ucma internally detaches the userspace context from anything
that is holding a device reference, and marks the context as "dead"
(only valid operation is destroying it).

uverbs is trickier, because a userspace process will typically have some
hardware resources directly mmap'ed.  We need a way to "revoke" that
mmap and have it point at a dummy page until userspace releases it --
and last I looked I wasn't sure how to do that.

 - R.
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found]         ` <4BACE28A.2080409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2010-03-26 16:50           ` Tung, Chien Tin
  2010-03-26 17:08           ` Roland Dreier
@ 2010-03-26 17:08           ` Sean Hefty
       [not found]             ` <AB19885D9F4245DFABBCB131959382C5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  2 siblings, 1 reply; 25+ messages in thread
From: Sean Hefty @ 2010-03-26 17:08 UTC (permalink / raw)
  To: 'Steve Wise'; +Cc: Roland Dreier, linux-rdma

>In other words, I think we want the ucma context to stay around until
>the application destroys it (via explicit means or via exit).   But the
>rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.

Yes - this is what I was trying to say.

The problem is trying to destroy the rdma_cm_id 'immediately' upon receiving a
device remove event.  The user could be in a separate downcall into the rdma_cm
at the time the device removal occurs, and we cannot call rdma_destroy_id while
calling rdma_foo() for the same id.  Plus, 'foo' may be 'destroy_id', and we
can't destroy the id twice.

For the trivial case, if the rdma_cm_id is not in use, we just return -1 to the
remove device callback to destroy the id.  But if the rdma_cm_id is in use, then
we must schedule the destruction of the rdma_cm_id to a separate thread and
synchronize that thread against user space destroying the id before the thread
can run.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found]                 ` <4BACE7E8.3040803-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 17:29                   ` Tung, Chien Tin
       [not found]                     ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD8DD-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Tung, Chien Tin @ 2010-03-26 17:29 UTC (permalink / raw)
  To: Steve Wise; +Cc: Hefty, Sean, Roland Dreier, linux-rdma

>Tung, Chien Tin wrote:
>>> In other words, I think we want the ucma context to stay around until
>>> the application destroys it (via explicit means or via exit).   But the
>>> rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.
>>>
>>
>> How do we "take care" of evil applications that won't go away?
>>
>>
>
>Since the low level rdma_cm_id _is_ destroyed, then the RDMA device can
>unload and go away.  The evil app then only is wasting the ucma
>contexts, file descriptors, etc.

Yup, let someone else hold the bag...

>Now, we could also consider something more abrupt like delivering a
>SIGABRT or SIGBUS to all processes that have objects allocated for the
>device that is going away.    But that is kind of drastic, and if done
>unconditionally, will kill applications that want to process the device
>removal and free the objects using that device, but still want to
>continue running on the available devices.   So I wouldn't recommend we
>deliver fatal signals...


I'm against violence as well.  But to Roland's point, how will we ummap resources?
If an application won't respond to device removal event and clean up properly,
perhaps it is "okay" to let it crash.  Alternatively, what about a
RDMA_CM_EVENT_DEVICE_REMOVAL_PENDING and RDMA_CM_EVENT_DEVICE_REMOVED scheme.
Post the first event to allow good applications to clean up.  The second event
to notify apps that the device is "gone".  After the second event, we can then
get violent and shoot to kill?

Chien
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]             ` <AB19885D9F4245DFABBCB131959382C5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2010-03-26 17:42               ` Steve Wise
  0 siblings, 0 replies; 25+ messages in thread
From: Steve Wise @ 2010-03-26 17:42 UTC (permalink / raw)
  To: Sean Hefty; +Cc: Roland Dreier, linux-rdma

Sean Hefty wrote:
>> In other words, I think we want the ucma context to stay around until
>> the application destroys it (via explicit means or via exit).   But the
>> rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.
>>     
>
> Yes - this is what I was trying to say.
>
> The problem is trying to destroy the rdma_cm_id 'immediately' upon receiving a
> device remove event.  The user could be in a separate downcall into the rdma_cm
> at the time the device removal occurs, and we cannot call rdma_destroy_id while
> calling rdma_foo() for the same id.  Plus, 'foo' may be 'destroy_id', and we
> can't destroy the id twice.
>
> For the trivial case, if the rdma_cm_id is not in use, we just return -1 to the
> remove device callback to destroy the id.  But if the rdma_cm_id is in use, then
> we must schedule the destruction of the rdma_cm_id to a separate thread and
> synchronize that thread against user space destroying the id before the thread
> can run.
>
>   

I think you don't want to schedule another thread.  The semantics of the 
device removal and eeh events is that when the driver returns, all usage 
of the device is stopped and it can be removed, reset, etc.  Because of 
this, these events are run on a thread so they can block if needed.

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]             ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2010-03-26 17:47               ` Steve Wise
  2010-03-26 18:29               ` Steve Wise
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Steve Wise @ 2010-03-26 17:47 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Sean Hefty, linux-rdma

Roland Dreier wrote:
>  > In other words, I think we want the ucma context to stay around until
>  > the application destroys it (via explicit means or via exit).   But
>  > the rdma_cm_id gets destroyed immediately upon receiving a
>  > DEVICE_REMOVE event.
>
> Yes.  The RDMA CM is somewhat easier, but I think the basic idea should
> be that ucma internally detaches the userspace context from anything
> that is holding a device reference, and marks the context as "dead"
> (only valid operation is destroying it).
>
> uverbs is trickier, because a userspace process will typically have some
> hardware resources directly mmap'ed.  We need a way to "revoke" that
> mmap and have it point at a dummy page until userspace releases it --
> and last I looked I wasn't sure how to do that.
>   
hmm.  Yes, now I remember our last discussion and why we punted? :) 

Also, this is a device-specific issue.  Each rdma device driver/provider 
would have to deal with this.  But like you said, if the driver can 
revoke and remap to a dummy page, then we could avoid inadvertent 
process crashes...

Anyone out there know how we can do this? 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                     ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD8DD-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-03-26 17:58                       ` Steve Wise
       [not found]                         ` <4BACF5B8.7090304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 17:58 UTC (permalink / raw)
  To: Tung, Chien Tin; +Cc: Hefty, Sean, Roland Dreier, linux-rdma


> But to Roland's point, how will we ummap resources?
> If an application won't respond to device removal event and clean up properly,
> perhaps it is "okay" to let it crash.  Alternatively, what about a
> RDMA_CM_EVENT_DEVICE_REMOVAL_PENDING and RDMA_CM_EVENT_DEVICE_REMOVED scheme.
> Post the first event to allow good applications to clean up.  The second event
> to notify apps that the device is "gone".  After the second event, we can then
> get violent and shoot to kill?
>   

You probably don't need the two events as you can detect when the apps 
free up these resources anyway.   So your proposal boils down to:  post 
the DEVICE_REMOVAL/ DEVICE_FATAL events, and wait some amount of time.  
After said timeout, you fire a SIGBUS at each process still owning 
resources for the device in question.

Roland, is that terrible in your opinion?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                         ` <4BACF5B8.7090304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 18:05                           ` Steve Wise
  0 siblings, 0 replies; 25+ messages in thread
From: Steve Wise @ 2010-03-26 18:05 UTC (permalink / raw)
  To: Tung, Chien Tin; +Cc: Hefty, Sean, Roland Dreier, linux-rdma

Steve Wise wrote:
>
>> But to Roland's point, how will we ummap resources?
>> If an application won't respond to device removal event and clean up 
>> properly,
>> perhaps it is "okay" to let it crash.  Alternatively, what about a
>> RDMA_CM_EVENT_DEVICE_REMOVAL_PENDING and RDMA_CM_EVENT_DEVICE_REMOVED 
>> scheme.
>> Post the first event to allow good applications to clean up.  The 
>> second event
>> to notify apps that the device is "gone".  After the second event, we 
>> can then
>> get violent and shoot to kill?
>>   
>
> You probably don't need the two events as you can detect when the apps 
> free up these resources anyway.   So your proposal boils down to:  
> post the DEVICE_REMOVAL/ DEVICE_FATAL events, and wait some amount of 
> time.  After said timeout, you fire a SIGBUS at each process still 
> owning resources for the device in question.
>
> Roland, is that terrible in your opinion?
>
>

Actually, we don't need to deliver the signal at all.   Just continue 
with the device removal after the timeout. The mapped resources would 
get unmapped, I guess, and then accessing them would cause a fault in 
the process. 

So we try and wait for well behaved apps, but we don't hang the device 
removal forever...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]             ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  2010-03-26 17:47               ` Steve Wise
@ 2010-03-26 18:29               ` Steve Wise
       [not found]                 ` <4BACFCF5.6030501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  2010-03-26 20:18               ` Tung, Chien Tin
  2010-03-26 22:54               ` Sean Hefty
  3 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 18:29 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Sean Hefty, linux-rdma


Roland Dreier wrote:
> We need a way to "revoke" that
> mmap and have it point at a dummy page until userspace releases it --
> and last I looked I wasn't sure how to do that.

do_mremap()?


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                 ` <4BACFCF5.6030501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 18:55                   ` Roland Dreier
       [not found]                     ` <adask7myktq.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Roland Dreier @ 2010-03-26 18:55 UTC (permalink / raw)
  To: Steve Wise; +Cc: Sean Hefty, linux-rdma

 > > We need a way to "revoke" that
 > > mmap and have it point at a dummy page until userspace releases it --
 > > and last I looked I wasn't sure how to do that.

 > do_mremap()?

Not exported to modules, is it?
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                         ` <4BAD047A.1000408-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-03-26 18:59                           ` Roland Dreier
       [not found]                             ` <adaociaykn1.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Roland Dreier @ 2010-03-26 18:59 UTC (permalink / raw)
  To: Steve Wise; +Cc: Sean Hefty, linux-rdma

I'll ask the experts on LKML when I get a chance to write up the problem description.
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                     ` <adask7myktq.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2010-03-26 19:01                       ` Steve Wise
       [not found]                         ` <4BAD047A.1000408-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-03-26 19:01 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Sean Hefty, linux-rdma

Roland Dreier wrote:
>  > > We need a way to "revoke" that
>  > > mmap and have it point at a dummy page until userspace releases it --
>  > > and last I looked I wasn't sure how to do that.
>
>  > do_mremap()?
>
> Not exported to modules, is it?
>   

no. :(


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found]             ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
  2010-03-26 17:47               ` Steve Wise
  2010-03-26 18:29               ` Steve Wise
@ 2010-03-26 20:18               ` Tung, Chien Tin
       [not found]                 ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFDB38-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2010-03-26 22:54               ` Sean Hefty
  3 siblings, 1 reply; 25+ messages in thread
From: Tung, Chien Tin @ 2010-03-26 20:18 UTC (permalink / raw)
  To: Roland Dreier, Steve Wise; +Cc: Hefty, Sean, linux-rdma

>uverbs is trickier, because a userspace process will typically have some
>hardware resources directly mmap'ed.  We need a way to "revoke" that
>mmap and have it point at a dummy page until userspace releases it --
>and last I looked I wasn't sure how to do that.


Isn't "revoking" the mmap enough, why do we need to remap it
to a dummy page?  Why not just let the app hang on to the allocated virtual
memory until it exits?  I am thinking of registered memory here.

Chien


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                 ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFDB38-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-03-26 20:45                   ` Roland Dreier
  0 siblings, 0 replies; 25+ messages in thread
From: Roland Dreier @ 2010-03-26 20:45 UTC (permalink / raw)
  To: Tung, Chien Tin; +Cc: Steve Wise, Hefty, Sean, linux-rdma

 > >uverbs is trickier, because a userspace process will typically have some
 > >hardware resources directly mmap'ed.  We need a way to "revoke" that
 > >mmap and have it point at a dummy page until userspace releases it --
 > >and last I looked I wasn't sure how to do that.

 > Isn't "revoking" the mmap enough, why do we need to remap it
 > to a dummy page?  Why not just let the app hang on to the allocated virtual
 > memory until it exits?  I am thinking of registered memory here.

If you revoke the mmap, then presumably the process will seg fault if it
tries to access the page.  So you need to point the map to some real
page to handle accesses until the process actually closes the device.

(And I think what we'll want to do is queue up the "catastrophic error"
async event and then immediately allow the low-level device to reset, so
we need to handle the mmap in a way that doesn't kill the app; otherwise
there's no way for even a well-behaved app to avoid crashing, since it
might be about to do an MMIO to the mmapped registers just as the async
event is created)
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Problem with RDMA device removal architecture
       [not found]             ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
                                 ` (2 preceding siblings ...)
  2010-03-26 20:18               ` Tung, Chien Tin
@ 2010-03-26 22:54               ` Sean Hefty
       [not found]                 ` <06FEECB9AB064B309D21BA9AC0A4BFFD-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
  3 siblings, 1 reply; 25+ messages in thread
From: Sean Hefty @ 2010-03-26 22:54 UTC (permalink / raw)
  To: 'Roland Dreier', Steve Wise; +Cc: linux-rdma

>uverbs is trickier, because a userspace process will typically have some
>hardware resources directly mmap'ed.  We need a way to "revoke" that
>mmap and have it point at a dummy page until userspace releases it --

What exactly would the dummy page contain and could the application/library read
it assuming that it is obtaining valid data?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                 ` <06FEECB9AB064B309D21BA9AC0A4BFFD-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
@ 2010-03-26 23:00                   ` Roland Dreier
  0 siblings, 0 replies; 25+ messages in thread
From: Roland Dreier @ 2010-03-26 23:00 UTC (permalink / raw)
  To: Sean Hefty; +Cc: Steve Wise, linux-rdma

 > >uverbs is trickier, because a userspace process will typically have some
 > >hardware resources directly mmap'ed.  We need a way to "revoke" that
 > >mmap and have it point at a dummy page until userspace releases it --

 > What exactly would the dummy page contain and could the application/library read
 > it assuming that it is obtaining valid data?

It would be an all-0s page I guess (to avoid leaking kernel data).

What the device driver library would do with it is device-specific but
eg Mellanox uses it write-only.  Not sure if any other devices put
something readable there -- if so I guess that driver should put a safe
value there if possible.
-- 
Roland Dreier  <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                             ` <adaociaykn1.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
@ 2010-05-21 18:06                               ` Steve Wise
       [not found]                                 ` <4BF6CBAA.5020906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Steve Wise @ 2010-05-21 18:06 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Sean Hefty, linux-rdma, Wen Xiong

Roland Dreier wrote:
> I'll ask the experts on LKML when I get a chance to write up the problem description.
>   

Hey Roland,


Did you ever get any feedback from the LKML folks?

Thanks,

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Problem with RDMA device removal architecture
       [not found]                                 ` <4BF6CBAA.5020906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-05-21 18:30                                   ` Roland Dreier
  0 siblings, 0 replies; 25+ messages in thread
From: Roland Dreier @ 2010-05-21 18:30 UTC (permalink / raw)
  To: Steve Wise; +Cc: Sean Hefty, linux-rdma, Wen Xiong

 > > I'll ask the experts on LKML when I get a chance to write up the problem description.

 > Did you ever get any feedback from the LKML folks?

No, I forgot about writing it up.  Thanks for the reminder.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-05-21 18:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-26 15:57 Problem with RDMA device removal architecture Steve Wise
     [not found] ` <4BACD985.1070906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 16:16   ` Sean Hefty
     [not found]     ` <DAF23AFA2B904B32B418D9C8798D4276-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-03-26 16:36       ` Steve Wise
     [not found]         ` <4BACE28A.2080409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 16:50           ` Tung, Chien Tin
     [not found]             ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD851-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 16:59               ` Steve Wise
     [not found]                 ` <4BACE7E8.3040803-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 17:29                   ` Tung, Chien Tin
     [not found]                     ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD8DD-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 17:58                       ` Steve Wise
     [not found]                         ` <4BACF5B8.7090304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 18:05                           ` Steve Wise
2010-03-26 17:08           ` Roland Dreier
     [not found]             ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-03-26 17:47               ` Steve Wise
2010-03-26 18:29               ` Steve Wise
     [not found]                 ` <4BACFCF5.6030501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 18:55                   ` Roland Dreier
     [not found]                     ` <adask7myktq.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-03-26 19:01                       ` Steve Wise
     [not found]                         ` <4BAD047A.1000408-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 18:59                           ` Roland Dreier
     [not found]                             ` <adaociaykn1.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-05-21 18:06                               ` Steve Wise
     [not found]                                 ` <4BF6CBAA.5020906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-05-21 18:30                                   ` Roland Dreier
2010-03-26 20:18               ` Tung, Chien Tin
     [not found]                 ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFDB38-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 20:45                   ` Roland Dreier
2010-03-26 22:54               ` Sean Hefty
     [not found]                 ` <06FEECB9AB064B309D21BA9AC0A4BFFD-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-03-26 23:00                   ` Roland Dreier
2010-03-26 17:08           ` Sean Hefty
     [not found]             ` <AB19885D9F4245DFABBCB131959382C5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-03-26 17:42               ` Steve Wise
2010-03-26 16:47   ` Tung, Chien Tin
     [not found]     ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD841-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 16:53       ` Steve Wise
     [not found]         ` <4BACE695.4010006-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 17:02           ` Tung, Chien Tin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox