stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Junxian Huang <huangjunxian6@hisilicon.com>,
	Leon Romanovsky <leon@kernel.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.6 15/77] RDMA/hns: Fix soft lockup during bt pages loop
Date: Tue, 25 Mar 2025 08:22:10 -0400	[thread overview]
Message-ID: <20250325122144.737581132@linuxfoundation.org> (raw)
In-Reply-To: <20250325122144.259256924@linuxfoundation.org>

6.6-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Junxian Huang <huangjunxian6@hisilicon.com>

[ Upstream commit 25655580136de59ec89f09089dd28008ea440fc9 ]

Driver runs a for-loop when allocating bt pages and mapping them with
buffer pages. When a large buffer (e.g. MR over 100GB) is being allocated,
it may require a considerable loop count. This will lead to soft lockup:

        watchdog: BUG: soft lockup - CPU#27 stuck for 22s!
        ...
        Call trace:
         hem_list_alloc_mid_bt+0x124/0x394 [hns_roce_hw_v2]
         hns_roce_hem_list_request+0xf8/0x160 [hns_roce_hw_v2]
         hns_roce_mtr_create+0x2e4/0x360 [hns_roce_hw_v2]
         alloc_mr_pbl+0xd4/0x17c [hns_roce_hw_v2]
         hns_roce_reg_user_mr+0xf8/0x190 [hns_roce_hw_v2]
         ib_uverbs_reg_mr+0x118/0x290

        watchdog: BUG: soft lockup - CPU#35 stuck for 23s!
        ...
        Call trace:
         hns_roce_hem_list_find_mtt+0x7c/0xb0 [hns_roce_hw_v2]
         mtr_map_bufs+0xc4/0x204 [hns_roce_hw_v2]
         hns_roce_mtr_create+0x31c/0x3c4 [hns_roce_hw_v2]
         alloc_mr_pbl+0xb0/0x160 [hns_roce_hw_v2]
         hns_roce_reg_user_mr+0x108/0x1c0 [hns_roce_hw_v2]
         ib_uverbs_reg_mr+0x120/0x2bc

Add a cond_resched() to fix soft lockup during these loops. In order not
to affect the allocation performance of normal-size buffer, set the loop
count of a 100GB MR as the threshold to call cond_resched().

Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://patch.msgid.link/20250311084857.3803665-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/infiniband/hw/hns/hns_roce_hem.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c b/drivers/infiniband/hw/hns/hns_roce_hem.c
index 51ab6041ca91b..f13016dc8016a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -1416,6 +1416,11 @@ static int hem_list_alloc_root_bt(struct hns_roce_dev *hr_dev,
 	return ret;
 }
 
+/* This is the bottom bt pages number of a 100G MR on 4K OS, assuming
+ * the bt page size is not expanded by cal_best_bt_pg_sz()
+ */
+#define RESCHED_LOOP_CNT_THRESHOLD_ON_4K 12800
+
 /* construct the base address table and link them by address hop config */
 int hns_roce_hem_list_request(struct hns_roce_dev *hr_dev,
 			      struct hns_roce_hem_list *hem_list,
@@ -1424,6 +1429,7 @@ int hns_roce_hem_list_request(struct hns_roce_dev *hr_dev,
 {
 	const struct hns_roce_buf_region *r;
 	int ofs, end;
+	int loop;
 	int unit;
 	int ret;
 	int i;
@@ -1441,7 +1447,10 @@ int hns_roce_hem_list_request(struct hns_roce_dev *hr_dev,
 			continue;
 
 		end = r->offset + r->count;
-		for (ofs = r->offset; ofs < end; ofs += unit) {
+		for (ofs = r->offset, loop = 1; ofs < end; ofs += unit, loop++) {
+			if (!(loop % RESCHED_LOOP_CNT_THRESHOLD_ON_4K))
+				cond_resched();
+
 			ret = hem_list_alloc_mid_bt(hr_dev, r, unit, ofs,
 						    hem_list->mid_bt[i],
 						    &hem_list->btm_bt);
@@ -1498,9 +1507,14 @@ void *hns_roce_hem_list_find_mtt(struct hns_roce_dev *hr_dev,
 	struct list_head *head = &hem_list->btm_bt;
 	struct hns_roce_hem_item *hem, *temp_hem;
 	void *cpu_base = NULL;
+	int loop = 1;
 	int nr = 0;
 
 	list_for_each_entry_safe(hem, temp_hem, head, sibling) {
+		if (!(loop % RESCHED_LOOP_CNT_THRESHOLD_ON_4K))
+			cond_resched();
+		loop++;
+
 		if (hem_list_page_is_in_range(hem, offset)) {
 			nr = offset - hem->start;
 			cpu_base = hem->addr + nr * BA_BYTE_LEN;
-- 
2.39.5




  parent reply	other threads:[~2025-03-25 12:35 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 12:21 [PATCH 6.6 00/77] 6.6.85-rc1 review Greg Kroah-Hartman
2025-03-25 12:21 ` [PATCH 6.6 01/77] firmware: imx-scu: fix OF node leak in .probe() Greg Kroah-Hartman
2025-03-25 12:21 ` [PATCH 6.6 02/77] arm64: dts: freescale: tqma8mpql: Fix vqmmc-supply Greg Kroah-Hartman
2025-03-25 12:21 ` [PATCH 6.6 03/77] xfrm: fix tunnel mode TX datapath in packet offload mode Greg Kroah-Hartman
2025-03-25 12:21 ` [PATCH 6.6 04/77] xfrm_output: Force software GSO only in tunnel mode Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 05/77] soc: imx8m: Remove global soc_uid Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 06/77] soc: imx8m: Use devm_* to simplify probe failure handling Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 07/77] soc: imx8m: Unregister cpufreq and soc dev in cleanup path Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 08/77] ARM: dts: bcm2711: PL011 UARTs are actually r1p5 Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 09/77] arm64: dts: rockchip: Remove undocumented sdmmc property from lubancat-1 Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 10/77] RDMA/bnxt_re: Add missing paranthesis in map_qp_id_to_tbl_indx Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 11/77] RDMA/mlx5: Handle errors returned from mlx5r_ib_rate() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 12/77] ARM: OMAP1: select CONFIG_GENERIC_IRQ_CHIP Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 13/77] ARM: dts: bcm2711: Dont mark timer regs unconfigured Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 14/77] RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path Greg Kroah-Hartman
2025-03-25 12:22 ` Greg Kroah-Hartman [this message]
2025-03-25 12:22 ` [PATCH 6.6 16/77] RDMA/hns: Fix unmatched condition in error path of alloc_user_qp_db() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 17/77] RDMA/hns: Fix a missing rollback in error path of hns_roce_create_qp_common() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 18/77] RDMA/hns: Fix wrong value of max_sge_rd Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 19/77] Bluetooth: Fix error code in chan_alloc_skb_cb() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 20/77] Bluetooth: hci_event: Fix connection regression between LE and non-LE adapters Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 21/77] accel/qaic: Fix possible data corruption in BOs > 2G Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 22/77] ARM: davinci: da850: fix selecting ARCH_DAVINCI_DA8XX Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 23/77] ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 24/77] ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 25/77] net: atm: fix use after free in lec_send() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 26/77] net: lwtunnel: fix recursion loops Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 27/77] net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 28/77] Revert "gre: Fix IPv6 link-local address generation." Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 29/77] i2c: omap: fix IRQ storms Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 30/77] can: rcar_canfd: Fix page entries in the AFL list Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 31/77] can: ucan: fix out of bound read in strscpy() source Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 32/77] can: flexcan: only change CAN state when link up in system PM Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 33/77] can: flexcan: disable transceiver during " Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 34/77] drm/v3d: Dont run jobs that have errors flagged in its fence Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 35/77] riscv: dts: starfive: Fix a typo in StarFive JH7110 pin function definitions Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 36/77] regulator: dummy: force synchronous probing Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 37/77] regulator: check that dummy regulator has been probed before using it Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 38/77] accel/qaic: Fix integer overflow in qaic_validate_req() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 39/77] arm64: dts: freescale: imx8mp-verdin-dahlia: add Microphone Jack to sound card Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 40/77] arm64: dts: freescale: imx8mm-verdin-dahlia: " Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 41/77] arm64: dts: rockchip: fix pinmux of UART0 for PX30 Ringneck on Haikou Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 42/77] arm64: dts: rockchip: Add missing PCIe supplies to RockPro64 board dtsi Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 43/77] mmc: sdhci-brcmstb: add cqhci suspend/resume to PM ops Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 44/77] mmc: atmel-mci: Add missing clk_disable_unprepare() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 45/77] mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 46/77] mm/migrate: fix shmem xarray update during migration Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 47/77] proc: fix UAF in proc_get_inode() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 48/77] memcg: drain obj stock on cpu hotplug teardown Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 49/77] ARM: dts: imx6qdl-apalis: Fix poweroff on Apalis iMX6 Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 50/77] ARM: shmobile: smp: Enforce shmobile_smp_* alignment Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 51/77] efi/libstub: Avoid physical address 0x0 when doing random allocation Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 52/77] xsk: fix an integer overflow in xp_create_and_assign_umem() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 53/77] batman-adv: Ignore own maximum aggregation size during RX Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 54/77] soc: qcom: pdr: Fix the potential deadlock Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 55/77] drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse() Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 56/77] drm/sched: Fix fence reference count leak Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 57/77] drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 58/77] drm/amdgpu: Fix JPEG video caps max size for navi1x and raven Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 59/77] drm/amd/display: should support dmub hw lock on Replay Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 60/77] drm/amd/display: Use HW lock mgr for PSR1 when only one eDP Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 61/77] ksmbd: fix incorrect validation for num_aces field of smb_acl Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 62/77] mptcp: Fix data stream corruption in the address announcement Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 63/77] KVM: arm64: Calculate cptr_el2 traps on activating traps Greg Kroah-Hartman
2025-03-25 12:22 ` [PATCH 6.6 64/77] KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 65/77] KVM: arm64: Remove host FPSIMD saving for non-protected KVM Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 66/77] KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 67/77] KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 68/77] KVM: arm64: Refactor exit handlers Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 69/77] KVM: arm64: Mark some header functions as inline Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 70/77] KVM: arm64: Eagerly switch ZCR_EL{1,2} Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 71/77] arm64: dts: rockchip: fix u2phy1_host status for NanoPi R4S Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 72/77] Revert "sched/core: Reduce cost of sched_move_task when config autogroup" Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 73/77] btrfs: make sure that WRITTEN is set on all metadata blocks Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 74/77] bnxt_en: Fix receive ring space parameters when XDP is active Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 75/77] wifi: iwlwifi: support BIOS override for 5G9 in CA also in LARI version 8 Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 76/77] wifi: iwlwifi: mvm: ensure offloading TID queue exists Greg Kroah-Hartman
2025-03-25 12:23 ` [PATCH 6.6 77/77] netfilter: nft_counter: Use u64_stats_t for statistic Greg Kroah-Hartman
2025-03-25 15:07 ` [PATCH 6.6 00/77] 6.6.85-rc1 review Naresh Kamboju
2025-03-25 16:07   ` Dragan Simic
2025-03-25 23:36     ` Greg Kroah-Hartman
2025-03-26  2:33     ` Harshit Mogalapalli
2025-03-26  3:56       ` Dragan Simic
2025-03-26 15:38         ` Greg Kroah-Hartman
2025-03-27  7:12           ` Dragan Simic
2025-03-25 17:25 ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250325122144.737581132@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=huangjunxian6@hisilicon.com \
    --cc=leon@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).