From: Dan Aloni <dan.aloni@vastdata.com>
To: chuck.lever@oracle.com
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH] rpcrdma: don't decref EP if a ESTABLISHED did not happen
Date: Sun, 5 May 2024 15:49:10 +0300 [thread overview]
Message-ID: <20240505124910.1877325-1-dan.aloni@vastdata.com> (raw)
We found a case where `RDMA_CM_EVENT_DEVICE_REMOVAL` causes a refcount
underflow.
The specific scenario that caused this to happen is IB device bonding,
when bringing down one of the ports, or all ports. The situation is not
just a print - it also causes a non-recoverable state it is not even
possible to complete the disconnect and shut it down the mount,
requiring a reboot, suggesting that tear-down is also incomplete in this
state.
The trivial fix seems to work as such - if we did not receive a
`RDMA_CM_EVENT_ESTABLISHED`, we should not decref the EP, otherwise
`rpcrdma_xprt_drain` kills the EP prematurely in from the context of
`rpcrdma_xprt_disconnect`.
Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed')
Example crash:
rpcrdma: removing device mlx5_3 for 172.21.208.2:20049
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 60 PID: 19700 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110
Modules linked in: mst_pciconf(OE) nfsv3(OE) nfs_acl(OE) rpcsec_gss_krb5(OE) auth_rpcgss(OE) nfsv4(OE) dns_resolver rpcrdma(OE) nfs(OE) lockd(OE) grace compat_nfs_ssc(OE) snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore uio_pci_generic uio vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_>
isst_if_mmio mei isst_if_common i2c_smbus intel_pch_thermal intel_vsec ipmi_msghandler acpi_power_meter xfs libcrc32c mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) sd_mod t10_pi sg mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci libahci drm crct10dif_pclmul mlx>
CPU: 60 PID: 19700 Comm: kworker/u132:4 Kdump: loaded Tainted: G W OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1
Hardware name: Dell Inc. PowerEdge C6520/0TY3YW, BIOS 1.8.2 09/14/2022
Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
RIP: 0010:refcount_warn_saturate+0xba/0x110
Code: 01 01 e8 27 e1 56 00 0f 0b c3 cc cc cc cc 80 3d b8 29 9b 01 00 75 85 48 c7 c7 38 ec 04 93 c6 05 a8 29 9b 01 01 e8 04 e1 56 00 <0f> 0b c3 cc cc cc cc 80 3d 93 29 9b 01 00 0f 85 5e ff ff ff 48 c7
RSP: 0018:ff34fa4968cafe10 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ff1210404a15e000 RCX: 0000000000000027
RDX: ff12103f803998a8 RSI: 0000000000000001 RDI: ff12103f803998a0
RBP: ff1210404a15e648 R08: 0000000000000000 R09: 00000000ffff7fff
R10: ff34fa4968cafcb0 R11: ffffffff939e9608 R12: 0000000000000000
R13: dead000000000122 R14: dead000000000100 R15: ff1210404a15e928
FS: 0000000000000000(0000) GS:ff12103f80380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f170f8a5000 CR3: 00000001c3adc002 CR4: 0000000000771ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
rpcrdma_ep_put+0x42/0x50 [rpcrdma]
rpcrdma_xprt_disconnect+0x303/0x3b0 [rpcrdma]
xprt_rdma_connect_worker+0xc8/0xd0 [rpcrdma]
process_one_work+0x1e5/0x3c0
? rescuer_thread+0x3a0/0x3a0
worker_thread+0x50/0x3b0
? rescuer_thread+0x3a0/0x3a0
kthread+0xd6/0x100
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
net/sunrpc/xprtrdma/verbs.c | 5 ++++-
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4f8d7efa469f..19996515da94 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -250,6 +250,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
goto disconnected;
case RDMA_CM_EVENT_ESTABLISHED:
rpcrdma_ep_get(ep);
+ ep->re_connect_ref = true;
ep->re_connect_status = 1;
rpcrdma_update_cm_private(ep, &event->param.conn);
trace_xprtrdma_inline_thresh(ep);
@@ -272,7 +273,9 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
ep->re_connect_status = -ECONNABORTED;
disconnected:
rpcrdma_force_disconnect(ep);
- return rpcrdma_ep_put(ep);
+ if (ep->re_connect_ref)
+ return rpcrdma_ep_put(ep);
+ return 0;
default:
break;
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index da409450dfc0..1553ef69a844 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -84,6 +84,7 @@ struct rpcrdma_ep {
unsigned int re_max_inline_recv;
int re_async_rc;
int re_connect_status;
+ bool re_connect_ref;
atomic_t re_receiving;
atomic_t re_force_disconnect;
struct ib_qp_init_attr re_attr;
--
2.39.3
next reply other threads:[~2024-05-05 12:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-05 12:49 Dan Aloni [this message]
2024-05-05 14:35 ` [PATCH] rpcrdma: don't decref EP if a ESTABLISHED did not happen Chuck Lever
2024-05-05 18:36 ` Dan Aloni
2024-05-05 18:38 ` [PATCH] rpcrdma: decref EP only if ESTABLISHED and handle DEVICE_REMOVAL Dan Aloni
2024-05-05 19:00 ` [PATCH] rpcrdma: don't decref EP if a ESTABLISHED did not happen Chuck Lever
2024-05-05 20:10 ` Dan Aloni
2024-05-12 17:51 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240505124910.1877325-1-dan.aloni@vastdata.com \
--to=dan.aloni@vastdata.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox