Archive-only list for patches
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Manjunath Patil <manjunath.b.patil@oracle.com>,
	Leon Romanovsky <leon@kernel.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 50/83] RDMA/cm: add timeout to cm_destroy_id wait
Date: Thu, 11 Apr 2024 11:57:22 +0200	[thread overview]
Message-ID: <20240411095414.190585160@linuxfoundation.org> (raw)
In-Reply-To: <20240411095412.671665933@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Manjunath Patil <manjunath.b.patil@oracle.com>

[ Upstream commit 96d9cbe2f2ff7abde021bac75eafaceabe9a51fa ]

Add timeout to cm_destroy_id, so that userspace can trigger any data
collection that would help in analyzing the cause of delay in destroying
the cm_id.

New noinline function helps dtrace/ebpf programs to hook on to it.
Existing functionality isn't changed except triggering a probe-able new
function at every timeout interval.

We have seen cases where CM messages stuck with MAD layer (either due to
software bug or faulty HCA), leading to cm_id getting stuck in the
following call stack. This patch helps in resolving such issues faster.

kernel: ... INFO: task XXXX:56778 blocked for more than 120 seconds.
...
	Call Trace:
	__schedule+0x2bc/0x895
	schedule+0x36/0x7c
	schedule_timeout+0x1f6/0x31f
 	? __slab_free+0x19c/0x2ba
	wait_for_completion+0x12b/0x18a
	? wake_up_q+0x80/0x73
	cm_destroy_id+0x345/0x610 [ib_cm]
	ib_destroy_cm_id+0x10/0x20 [ib_cm]
	rdma_destroy_id+0xa8/0x300 [rdma_cm]
	ucma_destroy_id+0x13e/0x190 [rdma_ucm]
	ucma_write+0xe0/0x160 [rdma_ucm]
	__vfs_write+0x3a/0x16d
	vfs_write+0xb2/0x1a1
	? syscall_trace_enter+0x1ce/0x2b8
	SyS_write+0x5c/0xd3
	do_syscall_64+0x79/0x1b9
	entry_SYSCALL_64_after_hwframe+0x16d/0x0

Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Link: https://lore.kernel.org/r/20240309063323.458102-1-manjunath.b.patil@oracle.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/infiniband/core/cm.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index b7f9023442890..462a10d6a5762 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -34,6 +34,7 @@ MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("InfiniBand CM");
 MODULE_LICENSE("Dual BSD/GPL");
 
+#define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
 static const char * const ibcm_rej_reason_strs[] = {
 	[IB_CM_REJ_NO_QP]			= "no QP",
 	[IB_CM_REJ_NO_EEC]			= "no EEC",
@@ -1025,10 +1026,20 @@ static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
 	}
 }
 
+static noinline void cm_destroy_id_wait_timeout(struct ib_cm_id *cm_id)
+{
+	struct cm_id_private *cm_id_priv;
+
+	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
+	pr_err("%s: cm_id=%p timed out. state=%d refcnt=%d\n", __func__,
+	       cm_id, cm_id->state, refcount_read(&cm_id_priv->refcount));
+}
+
 static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 {
 	struct cm_id_private *cm_id_priv;
 	struct cm_work *work;
+	int ret;
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	spin_lock_irq(&cm_id_priv->lock);
@@ -1135,7 +1146,14 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 
 	xa_erase(&cm.local_id_table, cm_local_id(cm_id->local_id));
 	cm_deref_id(cm_id_priv);
-	wait_for_completion(&cm_id_priv->comp);
+	do {
+		ret = wait_for_completion_timeout(&cm_id_priv->comp,
+						  msecs_to_jiffies(
+						  CM_DESTROY_ID_WAIT_TIMEOUT));
+		if (!ret) /* timeout happened */
+			cm_destroy_id_wait_timeout(cm_id);
+	} while (!ret);
+
 	while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
 		cm_free_work(work);
 
-- 
2.43.0




  parent reply	other threads:[~2024-04-11 10:47 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11  9:56 [PATCH 6.1 00/83] 6.1.86-rc1 review Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 01/83] wifi: ath9k: fix LNA selection in ath_ant_try_scan() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 02/83] bnx2x: Fix firmware version string character counts Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 03/83] batman-adv: Return directly after a failed batadv_dat_select_candidates() in batadv_dat_forward_data() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 04/83] batman-adv: Improve exception handling in batadv_throw_uevent() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 05/83] wifi: rtw89: pci: enlarge RX DMA buffer to consider size of RX descriptor Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 06/83] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 07/83] wifi: iwlwifi: pcie: Add the PCI device id for new hardware Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 08/83] panic: Flush kernel log buffer at the end Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 09/83] cpuidle: Avoid potential overflow in integer multiplication Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 10/83] arm64: dts: rockchip: fix rk3328 hdmi ports node Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 11/83] arm64: dts: rockchip: fix rk3399 " Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 12/83] ionic: set adminq irq affinity Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 13/83] net: skbuff: add overflow debug check to pull/push helpers Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 14/83] firmware: tegra: bpmp: Return directly after a failed kzalloc() in get_filename() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 15/83] wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 16/83] pstore/zone: Add a null pointer check to the psz_kmsg_read Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 17/83] tools/power x86_energy_perf_policy: Fix file leak in get_pkg_num() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 18/83] net: pcs: xpcs: Return EINVAL in the internal methods Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 19/83] dma-direct: Leak pages on dma_set_decrypted() failure Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 20/83] wifi: ath11k: decrease MHI channel buffer length to 8KB Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 21/83] cpufreq: Dont unregister cpufreq cooling on CPU hotplug Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 22/83] btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 23/83] btrfs: export: handle invalid inode or root reference in btrfs_get_parent() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 24/83] btrfs: send: handle path ref underflow in header iterate_inode_ref() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 25/83] ice: use relative VSI index for VFs instead of PF VSI number Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 26/83] net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() Greg Kroah-Hartman
2024-04-11  9:56 ` [PATCH 6.1 27/83] Bluetooth: btintel: Fix null ptr deref in btintel_read_version Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 28/83] Bluetooth: btmtk: Add MODULE_FIRMWARE() for MT7922 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 29/83] drm/vc4: dont check if plane->state->fb == state->fb Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 30/83] Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 31/83] drm: panel-orientation-quirks: Add quirk for GPD Win Mini Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 32/83] pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 33/83] sysv: dont call sb_bread() with pointers_lock held Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 34/83] scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 35/83] isofs: handle CDs with bad root inode but good Joliet root directory Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 36/83] ASoC: Intel: common: DMI remap for rebranded Intel NUC M15 (LAPRC710) laptops Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 37/83] rcu-tasks: Repair RCU Tasks Trace quiescence check Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 38/83] Julia Lawall reported this null pointer dereference, this should fix it Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 39/83] media: sta2x11: fix irq handler cast Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 40/83] ALSA: firewire-lib: handle quirk to calculate payload quadlets as data block counter Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 41/83] ext4: add a hint for block bitmap corrupt state in mb_groups Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 42/83] ext4: forbid commit inconsistent quota data when errors=remount-ro Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 43/83] drm/amd/display: Fix nanosec stat overflow Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 44/83] drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 45/83] SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 46/83] Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default" Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 47/83] libperf evlist: Avoid out-of-bounds access Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 48/83] input/touchscreen: imagis: Correct the maximum touch area value Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 49/83] block: prevent division by zero in blk_rq_stat_sum() Greg Kroah-Hartman
2024-04-11  9:57 ` Greg Kroah-Hartman [this message]
2024-04-11  9:57 ` [PATCH 6.1 51/83] Input: imagis - use FIELD_GET where applicable Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 52/83] Input: allocate keycode for Display refresh rate toggle Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 53/83] platform/x86: touchscreen_dmi: Add an extra entry for a variant of the Chuwi Vi8 tablet Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 54/83] perf/x86/amd/lbr: Discard erroneous branch entries Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 55/83] ktest: force $buildonly = 1 for make_warnings_file test type Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 56/83] ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent environment Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 57/83] tools: iio: replace seekdir() in iio_generic_buffer Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 58/83] bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 59/83] usb: gadget: uvc: mark incomplete frames with UVC_STREAM_ERR Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 60/83] thunderbolt: Keep the domain powered when USB4 port is in redrive mode Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 61/83] usb: typec: tcpci: add generic tcpci fallback compatible Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 62/83] usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 63/83] thermal/of: Assume polling-delay(-passive) 0 when absent Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 64/83] ASoC: soc-core.c: Skip dummy codec when adding platforms Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 65/83] fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 66/83] io_uring: clear opcode specific data for an early failure Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 67/83] drivers/nvme: Add quirks for device 126f:2262 Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 68/83] fbmon: prevent division by zero in fb_videomode_from_videomode() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 69/83] netfilter: nf_tables: release batch on table validation from abort path Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 70/83] netfilter: nf_tables: release mutex after nft_gc_seq_end " Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 71/83] netfilter: nf_tables: discard table flag update with pending basechain deletion Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 72/83] tty: n_gsm: require CAP_NET_ADMIN to attach N_GSM0710 ldisc Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 73/83] gcc-plugins/stackleak: Avoid .head.text section Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 74/83] Revert "scsi: sd: usb_storage: uas: Access media prior to querying device properties" Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 75/83] Revert "scsi: core: Add struct for args to execution functions" Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 76/83] scsi: sd: usb_storage: uas: Access media prior to querying device properties Greg Kroah-Hartman
2024-04-11 11:30   ` Cyril Brulebois
2024-04-11  9:57 ` [PATCH 6.1 77/83] virtio: reenable config if freezing device failed Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 78/83] randomize_kstack: Improve entropy diffusion Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 79/83] platform/x86: intel-vbtn: Update tablet mode switch at end of probe Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 80/83] Bluetooth: btintel: Fixe build regression Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 81/83] net: mpls: error out if inner headers are not set Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 82/83] VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler() Greg Kroah-Hartman
2024-04-11  9:57 ` [PATCH 6.1 83/83] Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()" Greg Kroah-Hartman
2024-04-11 11:59 ` [PATCH 6.1 00/83] 6.1.86-rc1 review Pavel Machek
2024-04-11 14:15   ` Greg Kroah-Hartman
2024-04-12 19:31     ` Pavel Machek
2024-04-11 12:04 ` Pavel Machek
2024-04-11 17:13 ` SeongJae Park
2024-04-11 19:29 ` Florian Fainelli
2024-04-11 23:43 ` Shuah Khan
2024-04-12  6:38 ` Shreeya Patel
2024-04-12  7:25 ` Ron Economos
2024-04-12  8:04 ` Jon Hunter
2024-04-12 15:14 ` Naresh Kamboju
2024-04-12 20:15 ` Mateusz Jończyk
2024-04-12 22:23 ` Kelsey Steele

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240411095414.190585160@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=leon@kernel.org \
    --cc=manjunath.b.patil@oracle.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox