stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Manjunath Patil <manjunath.b.patil@oracle.com>,
	Leon Romanovsky <leon@kernel.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.15 34/57] RDMA/cm: add timeout to cm_destroy_id wait
Date: Thu, 11 Apr 2024 11:57:42 +0200	[thread overview]
Message-ID: <20240411095409.026183562@linuxfoundation.org> (raw)
In-Reply-To: <20240411095407.982258070@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Manjunath Patil <manjunath.b.patil@oracle.com>

[ Upstream commit 96d9cbe2f2ff7abde021bac75eafaceabe9a51fa ]

Add timeout to cm_destroy_id, so that userspace can trigger any data
collection that would help in analyzing the cause of delay in destroying
the cm_id.

New noinline function helps dtrace/ebpf programs to hook on to it.
Existing functionality isn't changed except triggering a probe-able new
function at every timeout interval.

We have seen cases where CM messages stuck with MAD layer (either due to
software bug or faulty HCA), leading to cm_id getting stuck in the
following call stack. This patch helps in resolving such issues faster.

kernel: ... INFO: task XXXX:56778 blocked for more than 120 seconds.
...
	Call Trace:
	__schedule+0x2bc/0x895
	schedule+0x36/0x7c
	schedule_timeout+0x1f6/0x31f
 	? __slab_free+0x19c/0x2ba
	wait_for_completion+0x12b/0x18a
	? wake_up_q+0x80/0x73
	cm_destroy_id+0x345/0x610 [ib_cm]
	ib_destroy_cm_id+0x10/0x20 [ib_cm]
	rdma_destroy_id+0xa8/0x300 [rdma_cm]
	ucma_destroy_id+0x13e/0x190 [rdma_ucm]
	ucma_write+0xe0/0x160 [rdma_ucm]
	__vfs_write+0x3a/0x16d
	vfs_write+0xb2/0x1a1
	? syscall_trace_enter+0x1ce/0x2b8
	SyS_write+0x5c/0xd3
	do_syscall_64+0x79/0x1b9
	entry_SYSCALL_64_after_hwframe+0x16d/0x0

Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Link: https://lore.kernel.org/r/20240309063323.458102-1-manjunath.b.patil@oracle.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/infiniband/core/cm.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 680c3ac8cd4c0..504e1adf1997a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -34,6 +34,7 @@ MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("InfiniBand CM");
 MODULE_LICENSE("Dual BSD/GPL");
 
+#define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
 static const char * const ibcm_rej_reason_strs[] = {
 	[IB_CM_REJ_NO_QP]			= "no QP",
 	[IB_CM_REJ_NO_EEC]			= "no EEC",
@@ -1032,10 +1033,20 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
 	}
 }
 
+static noinline void cm_destroy_id_wait_timeout(struct ib_cm_id *cm_id)
+{
+	struct cm_id_private *cm_id_priv;
+
+	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
+	pr_err("%s: cm_id=%p timed out. state=%d refcnt=%d\n", __func__,
+	       cm_id, cm_id->state, refcount_read(&cm_id_priv->refcount));
+}
+
 static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 {
 	struct cm_id_private *cm_id_priv;
 	struct cm_work *work;
+	int ret;
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	spin_lock_irq(&cm_id_priv->lock);
@@ -1142,7 +1153,14 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 
 	xa_erase(&cm.local_id_table, cm_local_id(cm_id->local_id));
 	cm_deref_id(cm_id_priv);
-	wait_for_completion(&cm_id_priv->comp);
+	do {
+		ret = wait_for_completion_timeout(&cm_id_priv->comp,
+						  msecs_to_jiffies(
+						  CM_DESTROY_ID_WAIT_TIMEOUT));
+		if (!ret) /* timeout happened */
+			cm_destroy_id_wait_timeout(cm_id);
+	} while (!ret);
+
 	while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
 		cm_free_work(work);
 
-- 
2.43.0




  parent reply	other threads:[~2024-04-11 10:50 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11  9:57 [PATCH 5.15 00/57] 5.15.155-rc1 review Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 01/57] net: dsa: fix panic when DSA master device unbinds on shutdown Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 02/57] wifi: ath9k: fix LNA selection in ath_ant_try_scan() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 03/57] batman-adv: Return directly after a failed batadv_dat_select_candidates() in batadv_dat_forward_data() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 04/57] batman-adv: Improve exception handling in batadv_throw_uevent() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 05/57] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 06/57] panic: Flush kernel log buffer at the end Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 07/57] cpuidle: Avoid potential overflow in integer multiplication Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 08/57] arm64: dts: rockchip: fix rk3328 hdmi ports node Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 09/57] arm64: dts: rockchip: fix rk3399 " Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 10/57] ionic: set adminq irq affinity Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 11/57] pstore/zone: Add a null pointer check to the psz_kmsg_read Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 12/57] tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 13/57] net: pcs: xpcs: Return EINVAL in the internal methods Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 14/57] wifi: ath11k: decrease MHI channel buffer length to 8KB Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 15/57] btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 16/57] btrfs: export: handle invalid inode or root reference in btrfs_get_parent() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 17/57] btrfs: send: handle path ref underflow in header iterate_inode_ref() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 18/57] net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 19/57] Bluetooth: btintel: Fix null ptr deref in btintel_read_version Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 20/57] Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 21/57] pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 22/57] sysv: dont call sb_bread() with pointers_lock held Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 23/57] scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 24/57] isofs: handle CDs with bad root inode but good Joliet root directory Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 25/57] media: sta2x11: fix irq handler cast Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 26/57] ALSA: firewire-lib: handle quirk to calculate payload quadlets as data block counter Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 27/57] ext4: add a hint for block bitmap corrupt state in mb_groups Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 28/57] ext4: forbid commit inconsistent quota data when errors=remount-ro Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 29/57] drm/amd/display: Fix nanosec stat overflow Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 30/57] SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 31/57] Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default" Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 32/57] libperf evlist: Avoid out-of-bounds access Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 33/57] block: prevent division by zero in blk_rq_stat_sum() Greg Kroah-Hartman
2024-04-11  9:57 ` Greg Kroah-Hartman [this message]
2024-04-11  9:57 ` [PATCH 5.15 35/57] Input: allocate keycode for Display refresh rate toggle Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 36/57] platform/x86: touchscreen_dmi: Add an extra entry for a variant of the Chuwi Vi8 tablet Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 37/57] ktest: force $buildonly = 1 for make_warnings_file test type Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 38/57] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 39/57] tools: iio: replace seekdir() in iio_generic_buffer Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 40/57] usb: typec: tcpci: add generic tcpci fallback compatible Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 41/57] usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 42/57] ASoC: soc-core.c: Skip dummy codec when adding platforms Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 43/57] fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 44/57] drivers/nvme: Add quirks for device 126f:2262 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 45/57] fbmon: prevent division by zero in fb_videomode_from_videomode() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 46/57] netfilter: nf_tables: release batch on table validation from abort path Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 47/57] netfilter: nf_tables: release mutex after nft_gc_seq_end " Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 48/57] netfilter: nf_tables: discard table flag update with pending basechain deletion Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 49/57] tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 50/57] gcc-plugins/stackleak: Ignore .noinstr.text and .entry.text Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 5.15 51/57] gcc-plugins/stackleak: Avoid .head.text section Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 52/57] virtio: reenable config if freezing device failed Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 53/57] x86/mm/pat: fix VM_PAT handling in COW mappings Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 54/57] randomize_kstack: Improve entropy diffusion Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 55/57] platform/x86: intel-vbtn: Update tablet mode switch at end of probe Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 56/57] Bluetooth: btintel: Fixe build regression Greg Kroah-Hartman
2024-04-11  9:58 ` [PATCH 5.15 57/57] VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() Greg Kroah-Hartman
2024-04-11 17:12 ` [PATCH 5.15 00/57] 5.15.155-rc1 review SeongJae Park
2024-04-11 18:36 ` Easwar Hariharan
2024-04-12  8:27   ` Greg Kroah-Hartman
2024-04-11 19:13 ` Florian Fainelli
2024-04-11 23:46 ` Shuah Khan
2024-04-12  6:40 ` Shreeya Patel
2024-04-12  7:28 ` Ron Economos
2024-04-12  8:03 ` Jon Hunter
2024-04-12 10:25 ` Harshit Mogalapalli
2024-04-12 10:50   ` Greg Kroah-Hartman
2024-04-12 15:57   ` Chuck Lever III
2024-04-12 20:06     ` Calum Mackay
2024-04-12 20:11     ` Harshit Mogalapalli
2024-04-12 20:23       ` Chuck Lever
2024-04-12 21:34         ` Harshit Mogalapalli
2024-04-13 15:56           ` Chuck Lever
2024-04-14  6:13             ` Greg Kroah-Hartman
2024-04-15 13:31               ` Chuck Lever
2024-04-12 18:24 ` Naresh Kamboju
2024-04-12 22:22 ` Kelsey Steele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240411095409.026183562@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=leon@kernel.org \
    --cc=manjunath.b.patil@oracle.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).