From: Dan Aloni <dan.aloni@vastdata.com>
To: chuck.lever@oracle.com
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH] rpcrdma: decref EP only if ESTABLISHED and handle DEVICE_REMOVAL
Date: Sun, 5 May 2024 21:38:26 +0300 [thread overview]
Message-ID: <20240505183826.2300475-1-dan.aloni@vastdata.com> (raw)
In-Reply-To: <20240505183628.g2hhzkrtna5asz6b@gmail.com>
Under the scenario of IB device bonding, when bringing down one of the
ports, or all ports, we saw xprtrdma entering a non-recoverable state
where it is not even possible to complete the disconnect and shut it
down the mount, requiring a reboot.
If a DEVICE_REMOVAL happened, it may be irrespective of whether the
CM_ID is connected, and ESTABLISHED may not have happened, so we need
to avoid a decref, plus make sure connect path is woken up.
Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed')
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
net/sunrpc/xprtrdma/verbs.c | 9 +++++++--
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4f8d7efa469f..43d7d6604c30 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -244,12 +244,15 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
case RDMA_CM_EVENT_DEVICE_REMOVAL:
pr_info("rpcrdma: removing device %s for %pISpc\n",
ep->re_id->device->name, sap);
- fallthrough;
+ ep->re_connect_status = -ENODEV;
+ wake_up_all(&ep->re_connect_wait);
+ goto disconnected;
case RDMA_CM_EVENT_ADDR_CHANGE:
ep->re_connect_status = -ENODEV;
goto disconnected;
case RDMA_CM_EVENT_ESTABLISHED:
rpcrdma_ep_get(ep);
+ ep->re_connect_ref = true;
ep->re_connect_status = 1;
rpcrdma_update_cm_private(ep, &event->param.conn);
trace_xprtrdma_inline_thresh(ep);
@@ -272,7 +275,9 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
ep->re_connect_status = -ECONNABORTED;
disconnected:
rpcrdma_force_disconnect(ep);
- return rpcrdma_ep_put(ep);
+ if (ep->re_connect_ref)
+ return rpcrdma_ep_put(ep);
+ return 0;
default:
break;
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index da409450dfc0..1553ef69a844 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -84,6 +84,7 @@ struct rpcrdma_ep {
unsigned int re_max_inline_recv;
int re_async_rc;
int re_connect_status;
+ bool re_connect_ref;
atomic_t re_receiving;
atomic_t re_force_disconnect;
struct ib_qp_init_attr re_attr;
--
2.39.3
next prev parent reply other threads:[~2024-05-05 18:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-05 12:49 [PATCH] rpcrdma: don't decref EP if a ESTABLISHED did not happen Dan Aloni
2024-05-05 14:35 ` Chuck Lever
2024-05-05 18:36 ` Dan Aloni
2024-05-05 18:38 ` Dan Aloni [this message]
2024-05-05 19:00 ` Chuck Lever
2024-05-05 20:10 ` Dan Aloni
2024-05-12 17:51 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240505183826.2300475-1-dan.aloni@vastdata.com \
--to=dan.aloni@vastdata.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox