stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Saravanan Vajravel <saravanan.vajravel@broadcom.com>,
	Leon Romanovsky <leon@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	markzhang@nvidia.com, linux-rdma@vger.kernel.org
Subject: [PATCH AUTOSEL 6.11 22/76] RDMA/mad: Improve handling of timed out WRs of mad agent
Date: Fri,  4 Oct 2024 14:16:39 -0400	[thread overview]
Message-ID: <20241004181828.3669209-22-sashal@kernel.org> (raw)
In-Reply-To: <20241004181828.3669209-1-sashal@kernel.org>

From: Saravanan Vajravel <saravanan.vajravel@broadcom.com>

[ Upstream commit 2a777679b8ccd09a9a65ea0716ef10365179caac ]

Current timeout handler of mad agent acquires/releases mad_agent_priv
lock for every timed out WRs. This causes heavy locking contention
when higher no. of WRs are to be handled inside timeout handler.

This leads to softlockup with below trace in some use cases where
rdma-cm path is used to establish connection between peer nodes

Trace:
-----
 BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u128:3:19767]
 CPU: 4 PID: 19767 Comm: kworker/u128:3 Kdump: loaded Tainted: G OE
     -------  ---  5.14.0-427.13.1.el9_4.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.4.8 11/26/2019
 Workqueue: ib_mad1 timeout_sends [ib_core]
 RIP: 0010:__do_softirq+0x78/0x2ac
 RSP: 0018:ffffb253449e4f98 EFLAGS: 00000246
 RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 000000000000001f
 RDX: 000000000000001d RSI: 000000003d1879ab RDI: fff363b66fd3a86b
 RBP: ffffb253604cbcd8 R08: 0000009065635f3b R09: 0000000000000000
 R10: 0000000000000040 R11: ffffb253449e4ff8 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040
 FS:  0000000000000000(0000) GS:ffff8caa1fc80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fd9ec9db900 CR3: 0000000891934006 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <IRQ>
  ? show_trace_log_lvl+0x1c4/0x2df
  ? show_trace_log_lvl+0x1c4/0x2df
  ? __irq_exit_rcu+0xa1/0xc0
  ? watchdog_timer_fn+0x1b2/0x210
  ? __pfx_watchdog_timer_fn+0x10/0x10
  ? __hrtimer_run_queues+0x127/0x2c0
  ? hrtimer_interrupt+0xfc/0x210
  ? __sysvec_apic_timer_interrupt+0x5c/0x110
  ? sysvec_apic_timer_interrupt+0x37/0x90
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? __do_softirq+0x78/0x2ac
  ? __do_softirq+0x60/0x2ac
  __irq_exit_rcu+0xa1/0xc0
  sysvec_call_function_single+0x72/0x90
  </IRQ>
  <TASK>
  asm_sysvec_call_function_single+0x16/0x20
 RIP: 0010:_raw_spin_unlock_irq+0x14/0x30
 RSP: 0018:ffffb253604cbd88 EFLAGS: 00000247
 RAX: 000000000001960d RBX: 0000000000000002 RCX: ffff8cad2a064800
 RDX: 000000008020001b RSI: 0000000000000001 RDI: ffff8cad5d39f66c
 RBP: ffff8cad5d39f600 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff8caa443e0c00 R11: ffffb253604cbcd8 R12: ffff8cacb8682538
 R13: 0000000000000005 R14: ffffb253604cbd90 R15: ffff8cad5d39f66c
  cm_process_send_error+0x122/0x1d0 [ib_cm]
  timeout_sends+0x1dd/0x270 [ib_core]
  process_one_work+0x1e2/0x3b0
  ? __pfx_worker_thread+0x10/0x10
  worker_thread+0x50/0x3a0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xdd/0x100
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x29/0x50
  </TASK>

Simplified timeout handler by creating local list of timed out WRs
and invoke send handler post creating the list. The new method acquires/
releases lock once to fetch the list and hence helps to reduce locking
contetiong when processing higher no. of WRs

Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Link: https://lore.kernel.org/r/20240722110325.195085-1-saravanan.vajravel@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/infiniband/core/mad.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 7439e47ff951d..70708fea12962 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2616,14 +2616,16 @@ static int retry_send(struct ib_mad_send_wr_private *mad_send_wr)
 
 static void timeout_sends(struct work_struct *work)
 {
+	struct ib_mad_send_wr_private *mad_send_wr, *n;
 	struct ib_mad_agent_private *mad_agent_priv;
-	struct ib_mad_send_wr_private *mad_send_wr;
 	struct ib_mad_send_wc mad_send_wc;
+	struct list_head local_list;
 	unsigned long flags, delay;
 
 	mad_agent_priv = container_of(work, struct ib_mad_agent_private,
 				      timed_work.work);
 	mad_send_wc.vendor_err = 0;
+	INIT_LIST_HEAD(&local_list);
 
 	spin_lock_irqsave(&mad_agent_priv->lock, flags);
 	while (!list_empty(&mad_agent_priv->wait_list)) {
@@ -2641,13 +2643,16 @@ static void timeout_sends(struct work_struct *work)
 			break;
 		}
 
-		list_del(&mad_send_wr->agent_list);
+		list_del_init(&mad_send_wr->agent_list);
 		if (mad_send_wr->status == IB_WC_SUCCESS &&
 		    !retry_send(mad_send_wr))
 			continue;
 
-		spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
+		list_add_tail(&mad_send_wr->agent_list, &local_list);
+	}
+	spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 
+	list_for_each_entry_safe(mad_send_wr, n, &local_list, agent_list) {
 		if (mad_send_wr->status == IB_WC_SUCCESS)
 			mad_send_wc.status = IB_WC_RESP_TIMEOUT_ERR;
 		else
@@ -2655,11 +2660,8 @@ static void timeout_sends(struct work_struct *work)
 		mad_send_wc.send_buf = &mad_send_wr->send_buf;
 		mad_agent_priv->agent.send_handler(&mad_agent_priv->agent,
 						   &mad_send_wc);
-
 		deref_mad_agent(mad_agent_priv);
-		spin_lock_irqsave(&mad_agent_priv->lock, flags);
 	}
-	spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
 }
 
 /*
-- 
2.43.0


  parent reply	other threads:[~2024-10-04 18:19 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04 18:16 [PATCH AUTOSEL 6.11 01/76] bpf: Call the missed btf_record_free() when map creation fails Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 02/76] selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 03/76] bpf: Fix a sdiv overflow issue Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 04/76] bpf: Check percpu map value size first Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 05/76] bpftool: Fix undefined behavior in qsort(NULL, 0, ...) Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 06/76] bpftool: Fix undefined behavior caused by shifting into the sign bit Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 07/76] s390/boot: Compile all files with the same march flag Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 08/76] s390/facility: Disable compile time optimization for decompressor code Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 09/76] s390/mm: Add cond_resched() to cmm_alloc/free_pages() Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 10/76] bpf, x64: Fix a jit convergence issue Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 11/76] ext4: fix i_data_sem unlock order in ext4_ind_migrate() Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 12/76] ext4: avoid use-after-free in ext4_ext_show_leaf() Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 13/76] ext4: ext4_search_dir should return a proper error Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 14/76] iomap: fix iomap_dio_zero() for fs bs > system page size Sasha Levin
2024-10-04 22:46   ` Dave Chinner
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 15/76] ext4: don't set SB_RDONLY after filesystem errors Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 16/76] ext4: nested locking for xattr inode Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 17/76] s390/cpum_sf: Remove WARN_ON_ONCE statements Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 18/76] ext4: filesystems without casefold feature cannot be mounted with siphash Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 19/76] s390/traps: Handle early warnings gracefully Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 20/76] bpf: Prevent tail call between progs attached to different hooks Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 21/76] ktest.pl: Avoid false positives with grub2 skip regex Sasha Levin
2024-10-04 18:16 ` Sasha Levin [this message]
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 23/76] soundwire: intel_bus_common: enable interrupts before exiting reset Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 24/76] PCI: Add function 0 DMA alias quirk for Glenfly Arise chip Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 25/76] RDMA/rtrs-srv: Avoid null pointer deref during path establishment Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 26/76] clk: bcm: bcm53573: fix OF node leak in init Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 27/76] PCI: Add ACS quirk for Qualcomm SA8775P Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 28/76] i2c: i801: Use a different adapter-name for IDF adapters Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 29/76] PCI: Mark Creative Labs EMU20k2 INTx masking as broken Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 30/76] i3c: master: cdns: Fix use after free vulnerability in cdns_i3c_master Driver Due to Race Condition Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 31/76] RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 32/76] io_uring: check if we need to reschedule during overflow flush Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 33/76] ntb: ntb_hw_switchtec: Fix use after free vulnerability in switchtec_ntb_remove due to race condition Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 34/76] mfd: intel_soc_pmic_chtwc: Make Lenovo Yoga Tab 3 X90F DMI match less strict Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 35/76] mfd: intel-lpss: Add Intel Arrow Lake-H LPSS PCI IDs Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 36/76] mfd: intel-lpss: Add Intel Panther Lake " Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 37/76] riscv: Omit optimized string routines when using KASAN Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 38/76] riscv: avoid Imbalance in RAS Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 39/76] RDMA/mlx5: Enforce umem boundaries for explicit ODP page faults Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 40/76] PCI: qcom: Disable mirroring of DBI and iATU register space in BAR region Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 41/76] PCI: endpoint: Assign PCI domain number for endpoint controllers Sasha Levin
2024-10-04 18:16 ` [PATCH AUTOSEL 6.11 42/76] soundwire: cadence: re-check Peripheral status with delayed_work Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 43/76] riscv/kexec_file: Fix relocation type R_RISCV_ADD16 and R_RISCV_SUB16 unknown Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 44/76] media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put() Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 45/76] remoteproc: imx_rproc: Use imx specific hook for find_loaded_rsc_table Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 46/76] clk: imx: Remove CLK_SET_PARENT_GATE for DRAM mux for i.MX7D Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 47/76] fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 48/76] fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN Sasha Levin
2024-10-07 10:15   ` Miklos Szeredi
2024-10-11 13:51     ` Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 49/76] fuse: handle idmappings properly in ->write_iter() Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 50/76] serial: protect uart_port_dtr_rts() in uart_shutdown() too Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 51/76] usb: typec: tipd: Free IRQ only if it was requested before Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 52/76] usb: typec: ucsi: Don't truncate the reads Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 53/76] usb: chipidea: udc: enable suspend interrupt after usb reset Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 54/76] usb: dwc2: Adjust the timing of USB Driver Interrupt Registration in the Crashkernel Scenario Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 55/76] xhci: dbc: Fix STALL transfer event handling Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 56/76] usb: host: xhci-plat: Parse xhci-missing_cas_quirk and apply quirk Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 57/76] comedi: ni_routing: tools: Check when the file could not be opened Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 58/76] LoongArch: Fix memleak in pci_acpi_scan_root() Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 59/76] netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 60/76] netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 61/76] virtio_pmem: Check device status before requesting flush Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 62/76] tools/iio: Add memory allocation failure check for trigger_name Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 63/76] staging: vme_user: added bound check to geoid Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 64/76] usb: gadget: uvc: Fix ERR_PTR dereference in uvc_v4l2.c Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 65/76] dm vdo: don't refer to dedupe_context after releasing it Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 66/76] driver core: bus: Fix double free in driver API bus_register() Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 67/76] driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 68/76] scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd() Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 69/76] scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 70/76] scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 71/76] drm/xe/oa: Fix overflow in oa batch buffer Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 72/76] drm/amdgpu: nuke the VM PD/PT shadow handling Sasha Levin
2024-10-08  6:46   ` Christian König
2024-10-08 13:04     ` Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 73/76] drm/amd/display: Check null pointer before dereferencing se Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 74/76] fbcon: Fix a NULL pointer dereference issue in fbcon_putcs Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 75/76] smb: client: fix UAF in async decryption Sasha Levin
2024-10-04 18:17 ` [PATCH AUTOSEL 6.11 76/76] fbdev: sisfb: Fix strbuf array overflow Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241004181828.3669209-22-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=markzhang@nvidia.com \
    --cc=saravanan.vajravel@broadcom.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).