From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Manjunath Patil <manjunath.b.patil@oracle.com>,
Leon Romanovsky <leon@kernel.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.15 34/57] RDMA/cm: add timeout to cm_destroy_id wait
Date: Thu, 11 Apr 2024 11:57:42 +0200 [thread overview]
Message-ID: <20240411095409.026183562@linuxfoundation.org> (raw)
In-Reply-To: <20240411095407.982258070@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manjunath Patil <manjunath.b.patil@oracle.com>
[ Upstream commit 96d9cbe2f2ff7abde021bac75eafaceabe9a51fa ]
Add timeout to cm_destroy_id, so that userspace can trigger any data
collection that would help in analyzing the cause of delay in destroying
the cm_id.
New noinline function helps dtrace/ebpf programs to hook on to it.
Existing functionality isn't changed except triggering a probe-able new
function at every timeout interval.
We have seen cases where CM messages stuck with MAD layer (either due to
software bug or faulty HCA), leading to cm_id getting stuck in the
following call stack. This patch helps in resolving such issues faster.
kernel: ... INFO: task XXXX:56778 blocked for more than 120 seconds.
...
Call Trace:
__schedule+0x2bc/0x895
schedule+0x36/0x7c
schedule_timeout+0x1f6/0x31f
? __slab_free+0x19c/0x2ba
wait_for_completion+0x12b/0x18a
? wake_up_q+0x80/0x73
cm_destroy_id+0x345/0x610 [ib_cm]
ib_destroy_cm_id+0x10/0x20 [ib_cm]
rdma_destroy_id+0xa8/0x300 [rdma_cm]
ucma_destroy_id+0x13e/0x190 [rdma_ucm]
ucma_write+0xe0/0x160 [rdma_ucm]
__vfs_write+0x3a/0x16d
vfs_write+0xb2/0x1a1
? syscall_trace_enter+0x1ce/0x2b8
SyS_write+0x5c/0xd3
do_syscall_64+0x79/0x1b9
entry_SYSCALL_64_after_hwframe+0x16d/0x0
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Link: https://lore.kernel.org/r/20240309063323.458102-1-manjunath.b.patil@oracle.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/infiniband/core/cm.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 680c3ac8cd4c0..504e1adf1997a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -34,6 +34,7 @@ MODULE_AUTHOR("Sean Hefty");
MODULE_DESCRIPTION("InfiniBand CM");
MODULE_LICENSE("Dual BSD/GPL");
+#define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
static const char * const ibcm_rej_reason_strs[] = {
[IB_CM_REJ_NO_QP] = "no QP",
[IB_CM_REJ_NO_EEC] = "no EEC",
@@ -1032,10 +1033,20 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
}
}
+static noinline void cm_destroy_id_wait_timeout(struct ib_cm_id *cm_id)
+{
+ struct cm_id_private *cm_id_priv;
+
+ cm_id_priv = container_of(cm_id, struct cm_id_private, id);
+ pr_err("%s: cm_id=%p timed out. state=%d refcnt=%d\n", __func__,
+ cm_id, cm_id->state, refcount_read(&cm_id_priv->refcount));
+}
+
static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
{
struct cm_id_private *cm_id_priv;
struct cm_work *work;
+ int ret;
cm_id_priv = container_of(cm_id, struct cm_id_private, id);
spin_lock_irq(&cm_id_priv->lock);
@@ -1142,7 +1153,14 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
xa_erase(&cm.local_id_table, cm_local_id(cm_id->local_id));
cm_deref_id(cm_id_priv);
- wait_for_completion(&cm_id_priv->comp);
+ do {
+ ret = wait_for_completion_timeout(&cm_id_priv->comp,
+ msecs_to_jiffies(
+ CM_DESTROY_ID_WAIT_TIMEOUT));
+ if (!ret) /* timeout happened */
+ cm_destroy_id_wait_timeout(cm_id);
+ } while (!ret);
+
while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
cm_free_work(work);
--
2.43.0
next prev parent reply other threads:[~2024-04-11 10:50 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-11 9:57 [PATCH 5.15 00/57] 5.15.155-rc1 review Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 01/57] net: dsa: fix panic when DSA master device unbinds on shutdown Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 02/57] wifi: ath9k: fix LNA selection in ath_ant_try_scan() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 03/57] batman-adv: Return directly after a failed batadv_dat_select_candidates() in batadv_dat_forward_data() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 04/57] batman-adv: Improve exception handling in batadv_throw_uevent() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 05/57] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 06/57] panic: Flush kernel log buffer at the end Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 07/57] cpuidle: Avoid potential overflow in integer multiplication Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 08/57] arm64: dts: rockchip: fix rk3328 hdmi ports node Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 09/57] arm64: dts: rockchip: fix rk3399 " Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 10/57] ionic: set adminq irq affinity Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 11/57] pstore/zone: Add a null pointer check to the psz_kmsg_read Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 12/57] tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 13/57] net: pcs: xpcs: Return EINVAL in the internal methods Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 14/57] wifi: ath11k: decrease MHI channel buffer length to 8KB Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 15/57] btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 16/57] btrfs: export: handle invalid inode or root reference in btrfs_get_parent() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 17/57] btrfs: send: handle path ref underflow in header iterate_inode_ref() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 18/57] net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 19/57] Bluetooth: btintel: Fix null ptr deref in btintel_read_version Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 20/57] Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 21/57] pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 22/57] sysv: dont call sb_bread() with pointers_lock held Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 23/57] scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 24/57] isofs: handle CDs with bad root inode but good Joliet root directory Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 25/57] media: sta2x11: fix irq handler cast Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 26/57] ALSA: firewire-lib: handle quirk to calculate payload quadlets as data block counter Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 27/57] ext4: add a hint for block bitmap corrupt state in mb_groups Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 28/57] ext4: forbid commit inconsistent quota data when errors=remount-ro Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 29/57] drm/amd/display: Fix nanosec stat overflow Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 30/57] SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 31/57] Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default" Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 32/57] libperf evlist: Avoid out-of-bounds access Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 33/57] block: prevent division by zero in blk_rq_stat_sum() Greg Kroah-Hartman
2024-04-11 9:57 ` Greg Kroah-Hartman [this message]
2024-04-11 9:57 ` [PATCH 5.15 35/57] Input: allocate keycode for Display refresh rate toggle Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 36/57] platform/x86: touchscreen_dmi: Add an extra entry for a variant of the Chuwi Vi8 tablet Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 37/57] ktest: force $buildonly = 1 for make_warnings_file test type Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 38/57] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 39/57] tools: iio: replace seekdir() in iio_generic_buffer Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 40/57] usb: typec: tcpci: add generic tcpci fallback compatible Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 41/57] usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 42/57] ASoC: soc-core.c: Skip dummy codec when adding platforms Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 43/57] fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 44/57] drivers/nvme: Add quirks for device 126f:2262 Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 45/57] fbmon: prevent division by zero in fb_videomode_from_videomode() Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 46/57] netfilter: nf_tables: release batch on table validation from abort path Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 47/57] netfilter: nf_tables: release mutex after nft_gc_seq_end " Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 48/57] netfilter: nf_tables: discard table flag update with pending basechain deletion Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 49/57] tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 50/57] gcc-plugins/stackleak: Ignore .noinstr.text and .entry.text Greg Kroah-Hartman
2024-04-11 9:57 ` [PATCH 5.15 51/57] gcc-plugins/stackleak: Avoid .head.text section Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 52/57] virtio: reenable config if freezing device failed Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 53/57] x86/mm/pat: fix VM_PAT handling in COW mappings Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 54/57] randomize_kstack: Improve entropy diffusion Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 55/57] platform/x86: intel-vbtn: Update tablet mode switch at end of probe Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 56/57] Bluetooth: btintel: Fixe build regression Greg Kroah-Hartman
2024-04-11 9:58 ` [PATCH 5.15 57/57] VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() Greg Kroah-Hartman
2024-04-11 17:12 ` [PATCH 5.15 00/57] 5.15.155-rc1 review SeongJae Park
2024-04-11 18:36 ` Easwar Hariharan
2024-04-12 8:27 ` Greg Kroah-Hartman
2024-04-11 19:13 ` Florian Fainelli
2024-04-11 23:46 ` Shuah Khan
2024-04-12 6:40 ` Shreeya Patel
2024-04-12 7:28 ` Ron Economos
2024-04-12 8:03 ` Jon Hunter
2024-04-12 10:25 ` Harshit Mogalapalli
2024-04-12 10:50 ` Greg Kroah-Hartman
2024-04-12 15:57 ` Chuck Lever III
2024-04-12 20:06 ` Calum Mackay
2024-04-12 20:11 ` Harshit Mogalapalli
2024-04-12 20:23 ` Chuck Lever
2024-04-12 21:34 ` Harshit Mogalapalli
2024-04-13 15:56 ` Chuck Lever
2024-04-14 6:13 ` Greg Kroah-Hartman
2024-04-15 13:31 ` Chuck Lever
2024-04-12 18:24 ` Naresh Kamboju
2024-04-12 22:22 ` Kelsey Steele
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240411095409.026183562@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=leon@kernel.org \
--cc=manjunath.b.patil@oracle.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.