From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Problem with RDMA device removal architecture
Date: Fri, 26 Mar 2010 10:57:57 -0500 [thread overview]
Message-ID: <4BACD985.1070906@opengridcomputing.com> (raw)
Hey Roland and RDMA experts,
I'd like to raise an issue with the the architecture of the Linux RDMA
subsystem regarding device removal and RDMA provider deregistration:
IBM/PPC and probably other vendors/platforms have virtual or logical
partitions running Linux and they want to be able to add or remove
devices, including rdma devices, in a hot-plug fashion. They also want
to be able to "reset" a failed device (EEH events). For other networking
devices, this works fine. With RDMA devices, however, it is possible
for user mode RDMA applications to totally hang the device removal
process by virtue of the fact that they don't release all their uverb
contexts and rdma cm ids. If an application, for example, allocates
and binds an rdma cm id, then just goes to sleep forever, that will hang
the removal of the underlying device. Here is the path I'm talking about:
0) an evil application has an rdma cm id bound to rdma device A. The
application is just sleeping doing nothing else.
1) device A event happens causing the device to unregister itself with
the RDMA core. This could be an EEH event requiring full device reset,
or a OS hot-plug removal event.
2) device A calls ib_unregister_device(). This results in calls to all
RDMA kernel clients' remove() function.
3) rdma_cm:cma_remove_one() and friends end up posting
RDMA_CM_EVENT_DEVICE_REMOVAL events to all kernel users.
4) rdma_ucm gets this event and dutifully posts it for the use app to
reap. But since the app doesn't reap this event and exit or at least
destroy the cm id, nothing else happens.
5) rdma_cm blocks awaiting all references on the device to go away.
Since there is an allocated cm id, it will block forever.
Similar logic exists in uverbs as well, I think, but with a uverbs
context as the object that must be released by the application.
I propose that this is actually a denial of service type issue and we
should consider ways to fix it. I believe we've had this discussion
before but punted on it. However, I think this is pretty important for
some OS/platform environments, and I'd like to discuss it again with the
goal to fix the code so this issue never happens.
Thoughts?
Thanks,
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2010-03-26 15:57 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-26 15:57 Steve Wise [this message]
[not found] ` <4BACD985.1070906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 16:16 ` Problem with RDMA device removal architecture Sean Hefty
[not found] ` <DAF23AFA2B904B32B418D9C8798D4276-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-03-26 16:36 ` Steve Wise
[not found] ` <4BACE28A.2080409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 16:50 ` Tung, Chien Tin
[not found] ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD851-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 16:59 ` Steve Wise
[not found] ` <4BACE7E8.3040803-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 17:29 ` Tung, Chien Tin
[not found] ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD8DD-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 17:58 ` Steve Wise
[not found] ` <4BACF5B8.7090304-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 18:05 ` Steve Wise
2010-03-26 17:08 ` Roland Dreier
[not found] ` <adazl1vxb7p.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-03-26 17:47 ` Steve Wise
2010-03-26 18:29 ` Steve Wise
[not found] ` <4BACFCF5.6030501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 18:55 ` Roland Dreier
[not found] ` <adask7myktq.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-03-26 19:01 ` Steve Wise
[not found] ` <4BAD047A.1000408-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 18:59 ` Roland Dreier
[not found] ` <adaociaykn1.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-05-21 18:06 ` Steve Wise
[not found] ` <4BF6CBAA.5020906-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-05-21 18:30 ` Roland Dreier
2010-03-26 20:18 ` Tung, Chien Tin
[not found] ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFDB38-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 20:45 ` Roland Dreier
2010-03-26 22:54 ` Sean Hefty
[not found] ` <06FEECB9AB064B309D21BA9AC0A4BFFD-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-03-26 23:00 ` Roland Dreier
2010-03-26 17:08 ` Sean Hefty
[not found] ` <AB19885D9F4245DFABBCB131959382C5-Zpru7NauK7drdx17CPfAsdBPR1lH4CV8@public.gmane.org>
2010-03-26 17:42 ` Steve Wise
2010-03-26 16:47 ` Tung, Chien Tin
[not found] ` <603F8A3875DCE940BA37B49D0A6EA0AE84CFD841-uLM7Qlg6Mbekrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-03-26 16:53 ` Steve Wise
[not found] ` <4BACE695.4010006-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-26 17:02 ` Tung, Chien Tin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BACD985.1070906@opengridcomputing.com \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox