From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: 'Devesh Sharma'
<Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org>,
'Shirley Ma' <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
"'Hefty,
Sean'" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
'Roland Dreier' <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider module
Date: Mon, 21 Jul 2014 10:22:42 -0500 [thread overview]
Message-ID: <005e01cfa4f7$9dd4cc80$d97e6580$@opengridcomputing.com> (raw)
In-Reply-To: <D88D1952-83A1-4FF9-B028-AAE7A859A5B1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> Sent: Monday, July 21, 2014 10:21 AM
> To: Steve Wise
> Cc: Devesh Sharma; Shirley Ma; Hefty, Sean; Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
>
>
> On Jul 21, 2014, at 11:03 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
> >
> >
> >> -----Original Message-----
> >> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> >> Sent: Monday, July 21, 2014 9:54 AM
> >> To: Devesh Sharma
> >> Cc: Shirley Ma; Steve Wise; Hefty, Sean; Roland Dreier; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> >>
> >> Hi Devesh-
> >>
> >> Thanks for drilling into this further.
> >>
> >> On Jul 21, 2014, at 7:48 AM, Devesh Sharma <Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org> wrote:
> >>
> >>> In rpcrdma_ep_connect():
> >>>
> >>> write_lock(&ia->ri_qplock);
> >>> old = ia->ri_id;
> >>> ia->ri_id = id;
> >>> write_unlock(&ia->ri_qplock);
> >>>
> >>> rdma_destroy_qp(old);
> >>> rdma_destroy_id(old); =============> Cm -id is destroyed here.
> >>>
> >>>
> >>> If following code fails in rpcrdma_ep_connect():
> >>> id = rpcrdma_create_id(xprt, ia,
> >>> (struct sockaddr *)&xprt->rx_data.addr);
> >>> if (IS_ERR(id)) {
> >>> rc = -EHOSTUNREACH;
> >>> goto out;
> >>> }
> >>>
> >>> it leaves old cm-id still alive. This will always fail if Device is removed
abruptly.
> >>
> >> For CM_EVENT_DEVICE_REMOVAL, rpcrdma_conn_upcall() sets ep->rep_connected
> >> to -ENODEV.
> >>
> >> Then:
> >>
> >> 929 int
> >> 930 rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
> >> 931 {
> >> 932 struct rdma_cm_id *id, *old;
> >> 933 int rc = 0;
> >> 934 int retry_count = 0;
> >> 935
> >> 936 if (ep->rep_connected != 0) {
> >> 937 struct rpcrdma_xprt *xprt;
> >> 938 retry:
> >> 939 dprintk("RPC: %s: reconnecting...\n", __func__);
> >>
> >> ep->rep_connected is probably -ENODEV after a device removal. It would be
> >> possible for the connect worker to destroy everything associated with this
> >> connection in that case to ensure the underlying object reference counts
> >> are cleared.
> >>
> >> The immediate danger is that if there are pending RPCs, they could exit while
> >> qp/cm_id are NULL, triggering a panic in rpcrdma_deregister_frmr_external().
> >> Checking for NULL pointers inside the ri_qplock would prevent that.
> >>
> >> However, NFS mounts via this adapter will hang indefinitely after all
> >> transports are torn down and the adapter is gone. The only thing that can be
> >> done is something drastic like "echo b > /proc/sysrq_trigger" on the client.
> >>
> >> Thus, IMO hot-plugging or passive fail-over are the only scenarios where
> >> this makes sense. If we have an immediate problem here, is it a problem with
> >> system shutdown ordering that can be addressed in some other way?
> >>
> >> Until that support is in place, obviously I would prefer that the removal of
> >> the underlying driver be prevented while there are NFS mounts in place. I
> >> think that's what NFS users have come to expect.
> >>
> >> In other words, don't allow device removal until we have support for device
> >> insertion :-)
> >>
> >>
> >
> >
> > If we fix the above problems on provider unload, shouldn't the mount recover if the
> > provider module is subsequently loaded? Or another provider configured such that
> > rdma_resolve_addr/route() then picks an active device?
>
> On device removal, we have to destroy everything.
>
> On insertion, we'll have to create a fresh PD and MRs, which isn't done now
> during reconnect. So, insertion needs work too.
>
Got it, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-07-21 15:22 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1405605697-11583-1-git-send-email-devesh.sharma@emulex.com>
[not found] ` <1405605697-11583-1-git-send-email-devesh.sharma-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
2014-07-17 14:01 ` [for-next 1/2] xprtrdma: take reference of rdma provider module Devesh Sharma
[not found] ` <3e39e90f-7095-4eb9-a844-516672a355ad-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
2014-07-17 15:01 ` Steve Wise
[not found] ` <53C7E546.3080008-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-07-17 15:05 ` Chuck Lever
[not found] ` <78A77C48-AC73-4C01-B139-A00B4F674C70-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-17 15:31 ` Devesh Sharma
2014-07-17 15:20 ` Devesh Sharma
2014-07-17 16:06 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739933FCA3-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-07-17 18:57 ` Shirley Ma
[not found] ` <53C81CB7.2030000-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-17 19:07 ` Steve Wise
2014-07-17 19:50 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739933FDEA-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-07-17 19:55 ` Steve Wise
2014-07-17 20:23 ` Shirley Ma
2014-07-17 20:08 ` Steve Wise
2014-07-17 20:41 ` Chuck Lever
[not found] ` <DF7CE85B-288D-4CC2-AD51-B326D5F1EE1A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-17 20:59 ` Steve Wise
2014-07-18 5:05 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1482F-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-18 13:27 ` Steve Wise
2014-07-18 15:47 ` Shirley Ma
[not found] ` <53C94199.4050601-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 6:11 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1C7B7-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-21 11:48 ` Devesh Sharma
[not found] ` <a6345162-863d -447c-b7c2-059ced190a13@CMEXHTCAS1.ad.emulex.com>
[not found] ` <a6345162-863d-447c-b7c2-059ced190a13-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
2014-07-21 14:53 ` Chuck Lever
[not found] ` <27ACE237-161A-4CA5-AA5C-6349CC4118E3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 15:03 ` Steve Wise
2014-07-21 15:20 ` Chuck Lever
[not found] ` <D88D1952-83A1-4FF9-B028-AAE7A859A 5B1@oracle.com>
[not found] ` <D88D1952-83A1-4FF9-B028-AAE7A859A5B1-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 15:22 ` Steve Wise [this message]
2014-07-21 17:07 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1D9CF-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-21 17:30 ` Chuck Lever
[not found] ` <0CDA5340-DDD6-42F8-8359-0069BBC9E24C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-22 5:06 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1DB1D-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-30 18:39 ` Chuck Lever
[not found] ` <A40CDF7D-7ED2-4D67-957F-8F977D567774-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-31 5:14 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE23695-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-08-18 9:52 ` Devesh Sharma
[not found] ` <6a71f6a5-f335-42c6-b8b7-8b4bac5aae83-3RiH6ntJJkP8BX6JNMqfyFjyZtpTMMwT@public.gmane.org>
2014-08-18 13:13 ` Chuck Lever
2014-07-21 5:23 ` Devesh Sharma
2014-07-17 21:25 ` Shirley Ma
2014-07-18 6:19 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE1686F-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-18 15:27 ` Chuck Lever
[not found] ` <D9783B2E-8D18-442E-9BFE-0863F9DD6B96-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-21 5:40 ` Devesh Sharma
2014-07-17 14:01 ` [for-next 2/2] xprtrdma: fix deallocation sequence of pd Devesh Sharma
[not found] ` <3fdcf67f-2e90-4c61-92da-a8f7743cf54a-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
2014-07-17 15:05 ` Steve Wise
[not found] ` <53C7E64D.90501-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-07-17 15:35 ` Devesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='005e01cfa4f7$9dd4cc80$d97e6580$@opengridcomputing.com' \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.