All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Zhu Yanjun <yanjunz@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>,
	Jason Gunthorpe <jgg@mellanox.com>
Subject: [PATCH 5.4 47/66] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
Date: Tue, 18 Feb 2020 20:55:14 +0100	[thread overview]
Message-ID: <20200218190432.365113067@linuxfoundation.org> (raw)
In-Reply-To: <20200218190428.035153861@linuxfoundation.org>

From: Zhu Yanjun <yanjunz@mellanox.com>

commit 8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c upstream.

When run stress tests with RXE, the following Call Traces often occur

  watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
  ...
  Call Trace:
  <IRQ>
  create_object+0x3f/0x3b0
  kmem_cache_alloc_node_trace+0x129/0x2d0
  __kmalloc_reserve.isra.52+0x2e/0x80
  __alloc_skb+0x83/0x270
  rxe_init_packet+0x99/0x150 [rdma_rxe]
  rxe_requester+0x34e/0x11a0 [rdma_rxe]
  rxe_do_task+0x85/0xf0 [rdma_rxe]
  tasklet_action_common.isra.21+0xeb/0x100
  __do_softirq+0xd0/0x298
  irq_exit+0xc5/0xd0
  smp_apic_timer_interrupt+0x68/0x120
  apic_timer_interrupt+0xf/0x20
  </IRQ>
  ...

The root cause is that tasklet is actually a softirq. In a tasklet
handler, another softirq handler is triggered. Usually these softirq
handlers run on the same cpu core. So this will cause "soft lockup Bug".

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/sw/rxe/rxe_comp.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(
 					qp->comp.psn = pkt->psn;
 					if (qp->req.wait_psn) {
 						qp->req.wait_psn = 0;
-						rxe_run_task(&qp->req.task, 1);
+						rxe_run_task(&qp->req.task, 0);
 					}
 				}
 				return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *q
 	 */
 	if (qp->req.wait_fence) {
 		qp->req.wait_fence = 0;
-		rxe_run_task(&qp->req.task, 1);
+		rxe_run_task(&qp->req.task, 0);
 	}
 }
 
@@ -479,7 +479,7 @@ static inline enum comp_state complete_a
 		if (qp->req.need_rd_atomic) {
 			qp->comp.timeout_retry = 0;
 			qp->req.need_rd_atomic = 0;
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task, 0);
 		}
 	}
 
@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
 							RXE_CNT_COMP_RETRY);
 					qp->req.need_retry = 1;
 					qp->comp.started_retry = 1;
-					rxe_run_task(&qp->req.task, 1);
+					rxe_run_task(&qp->req.task, 0);
 				}
 
 				if (pkt) {



  parent reply	other threads:[~2020-02-18 19:59 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 19:54 [PATCH 5.4 00/66] 5.4.21-stable review Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 01/66] Input: synaptics - switch T470s to RMI4 by default Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 02/66] Input: synaptics - enable SMBus on ThinkPad L470 Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 03/66] Input: synaptics - remove the LEN0049 dmi id from topbuttonpad list Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 04/66] ALSA: usb-audio: Fix UAC2/3 effect unit parsing Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 05/66] ALSA: hda/realtek - Add more codec supported Headset Button Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 06/66] ALSA: hda/realtek - Fix silent output on MSI-GL73 Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 07/66] ALSA: usb-audio: Apply sample rate quirk for Audioengine D1 Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 08/66] ACPI: EC: Fix flushing of pending work Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 09/66] ACPI: PM: s2idle: Avoid possible race related to the EC GPE Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 10/66] ACPICA: Introduce acpi_any_gpe_status_set() Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 11/66] ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 12/66] ALSA: usb-audio: sound: usb: usb true/false for bool return type Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 13/66] ALSA: usb-audio: Add clock validity quirk for Denon MC7000/MCX8000 Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 14/66] ext4: dont assume that mmp_nodename/bdevname have NUL Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 15/66] ext4: fix support for inode sizes > 1024 bytes Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 16/66] ext4: fix checksum errors with indexed dirs Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 17/66] ext4: add cond_resched() to ext4_protect_reserved_inode Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 18/66] ext4: improve explanation of a mount failure caused by a misconfigured kernel Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 19/66] Btrfs: fix race between using extent maps and merging them Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 20/66] btrfs: ref-verify: fix memory leaks Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 21/66] btrfs: print message when tree-log replay starts Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 22/66] btrfs: log message when rw remount is attempted with unclean tree-log Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 23/66] ARM: npcm: Bring back GPIOLIB support Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 24/66] gpio: xilinx: Fix bug where the wrong GPIO register is written to Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 25/66] arm64: ssbs: Fix context-switch when SSBS is present on all CPUs Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 26/66] xprtrdma: Fix DMA scatter-gather list mapping imbalance Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 27/66] cifs: make sure we do not overflow the max EA buffer size Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 28/66] EDAC/sysfs: Remove csrow objects on errors Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 29/66] EDAC/mc: Fix use-after-free and memleaks during device removal Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 30/66] KVM: nVMX: Use correct root level for nested EPT shadow page tables Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 31/66] perf/x86/amd: Add missing L2 misses event spec to AMD Family 17hs event map Greg Kroah-Hartman
2020-02-18 19:54 ` [PATCH 5.4 32/66] s390/pkey: fix missing length of protected key on return Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 33/66] s390/uv: Fix handling of length extensions Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 34/66] drm/vgem: Close use-after-free race in vgem_gem_create Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 35/66] drm/panfrost: Make sure the shrinker does not reclaim referenced BOs Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 36/66] bus: moxtet: fix potential stack buffer overflow Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 37/66] nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 38/66] drivers: ipmi: fix off-by-one bounds check that leads to a out-of-bounds write Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 39/66] IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 40/66] IB/hfi1: Acquire lock to release TID entries when user file is closed Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 41/66] IB/hfi1: Close window for pq and request coliding Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 42/66] IB/rdmavt: Reset all QPs when the device is shut down Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 43/66] IB/umad: Fix kernel crash while unloading ib_umad Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 44/66] RDMA/core: Fix invalid memory access in spec_filter_size Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 45/66] RDMA/iw_cxgb4: initiate CLOSE when entering TERM Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 46/66] RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_create Greg Kroah-Hartman
2020-02-18 19:55 ` Greg Kroah-Hartman [this message]
2020-02-18 19:55 ` [PATCH 5.4 48/66] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 49/66] s390/time: Fix clk type in get_tod_clock Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 50/66] sched/uclamp: Reject negative values in cpu_uclamp_write() Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 51/66] spmi: pmic-arb: Set lockdep class for hierarchical irq domains Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 52/66] perf/x86/intel: Fix inaccurate period in context switch for auto-reload Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 53/66] hwmon: (pmbus/ltc2978) Fix PMBus polling of MFR_COMMON definitions Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 54/66] mac80211: fix quiet mode activation in action frames Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 55/66] cifs: fix mount option display for sec=krb5i Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 56/66] arm64: dts: fast models: Fix FVP PCI interrupt-map property Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 57/66] KVM: x86: Mask off reserved bit from #DB exception payload Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 58/66] perf stat: Dont report a null stalled cycles per insn metric Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 59/66] NFSv4.1 make cachethis=no for writes Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 60/66] Revert "drm/sun4i: drv: Allow framebuffer modifiers in mode config" Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 61/66] jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 62/66] jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 63/66] ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project() Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 64/66] KVM: x86/mmu: Fix struct guest_walker arrays for 5-level paging Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 65/66] gpio: add gpiod_toggle_active_low() Greg Kroah-Hartman
2020-02-18 19:55 ` [PATCH 5.4 66/66] mmc: core: Rework wp-gpio handling Greg Kroah-Hartman
2020-02-18 23:34 ` [PATCH 5.4 00/66] 5.4.21-stable review shuah
2020-02-19  3:34 ` Naresh Kamboju
     [not found] ` <20200218190428.035153861-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-02-19 11:06   ` Jon Hunter
2020-02-19 11:06     ` Jon Hunter
2020-02-19 18:09 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200218190432.365113067@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jgg@mellanox.com \
    --cc=leonro@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=yanjunz@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.