linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Kaike Wan <kaike.wan@intel.com>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Jason Gunthorpe <jgg@mellanox.com>
Subject: [PATCH 5.4 32/36] IB/hfi1: Ensure pq is not left on waitlist
Date: Tue,  7 Apr 2020 12:22:05 +0200	[thread overview]
Message-ID: <20200407101458.345069338@linuxfoundation.org> (raw)
In-Reply-To: <20200407101454.281052964@linuxfoundation.org>

From: Mike Marciniszyn <mike.marciniszyn@intel.com>

commit 9a293d1e21a6461a11b4217b155bf445e57f4131 upstream.

The following warning can occur when a pq is left on the dmawait list and
the pq is then freed:

  WARNING: CPU: 47 PID: 3546 at lib/list_debug.c:29 __list_add+0x65/0xc0
  list_add corruption. next->prev should be prev (ffff939228da1880), but was ffff939cabb52230. (next=ffff939cabb52230).
  Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
  nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci libahci i2c_algo_bit dca libata ptp pps_core crc32c_intel [last unloaded: i2c_algo_bit]
  CPU: 47 PID: 3546 Comm: wrf.exe Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.41.1.el7.x86_64 #1
  Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
  Call Trace:
  [<ffffffff91f65ac0>] dump_stack+0x19/0x1b
  [<ffffffff91898b78>] __warn+0xd8/0x100
  [<ffffffff91898bff>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff91a1dabe>] ? ___slab_alloc+0x24e/0x4f0
  [<ffffffff91b97025>] __list_add+0x65/0xc0
  [<ffffffffc03926a5>] defer_packet_queue+0x145/0x1a0 [hfi1]
  [<ffffffffc0372987>] sdma_check_progress+0x67/0xa0 [hfi1]
  [<ffffffffc03779d2>] sdma_send_txlist+0x432/0x550 [hfi1]
  [<ffffffff91a20009>] ? kmem_cache_alloc+0x179/0x1f0
  [<ffffffffc0392973>] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1]
  [<ffffffffc0393e3a>] user_sdma_send_pkts+0x158a/0x1990 [hfi1]
  [<ffffffff918ab65e>] ? try_to_del_timer_sync+0x5e/0x90
  [<ffffffff91a3fe1a>] ? __check_object_size+0x1ca/0x250
  [<ffffffffc0395546>] hfi1_user_sdma_process_request+0xd66/0x1280 [hfi1]
  [<ffffffffc034e0da>] hfi1_aio_write+0xca/0x120 [hfi1]
  [<ffffffff91a4245b>] do_sync_readv_writev+0x7b/0xd0
  [<ffffffff91a4409e>] do_readv_writev+0xce/0x260
  [<ffffffff918df69f>] ? pick_next_task_fair+0x5f/0x1b0
  [<ffffffff918db535>] ? sched_clock_cpu+0x85/0xc0
  [<ffffffff91f6b16a>] ? __schedule+0x13a/0x860
  [<ffffffff91a442c5>] vfs_writev+0x35/0x60
  [<ffffffff91a4447f>] SyS_writev+0x7f/0x110
  [<ffffffff91f78ddb>] system_call_fastpath+0x22/0x27

The issue happens when wait_event_interruptible_timeout() returns a value
<= 0.

In that case, the pq is left on the list. The code continues sending
packets and potentially can complete the current request with the pq still
on the dmawait list provided no descriptor shortage is seen.

If the pq is torn down in that state, the sdma interrupt handler could
find the now freed pq on the list with list corruption or memory
corruption resulting.

Fix by adding a flush routine to ensure that the pq is never on a list
after processing a request.

A follow-up patch series will address issues with seqlock surfaced in:
https://lore.kernel.org/r/20200320003129.GP20941@ziepe.ca

The seqlock use for sdma will then be converted to a spin lock since the
list_empty() doesn't need the protection afforded by the sequence lock
currently in use.

Fixes: a0d406934a46 ("staging/rdma/hfi1: Add page lock limit check for SDMA requests")
Link: https://lore.kernel.org/r/20200320200200.23203.37777.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/infiniband/hw/hfi1/user_sdma.c |   25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -141,6 +141,7 @@ static int defer_packet_queue(
 	 */
 	xchg(&pq->state, SDMA_PKT_Q_DEFERRED);
 	if (list_empty(&pq->busy.list)) {
+		pq->busy.lock = &sde->waitlock;
 		iowait_get_priority(&pq->busy);
 		iowait_queue(pkts_sent, &pq->busy, &sde->dmawait);
 	}
@@ -155,6 +156,7 @@ static void activate_packet_queue(struct
 {
 	struct hfi1_user_sdma_pkt_q *pq =
 		container_of(wait, struct hfi1_user_sdma_pkt_q, busy);
+	pq->busy.lock = NULL;
 	xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
 	wake_up(&wait->wait_dma);
 };
@@ -256,6 +258,21 @@ pq_reqs_nomem:
 	return ret;
 }
 
+static void flush_pq_iowait(struct hfi1_user_sdma_pkt_q *pq)
+{
+	unsigned long flags;
+	seqlock_t *lock = pq->busy.lock;
+
+	if (!lock)
+		return;
+	write_seqlock_irqsave(lock, flags);
+	if (!list_empty(&pq->busy.list)) {
+		list_del_init(&pq->busy.list);
+		pq->busy.lock = NULL;
+	}
+	write_sequnlock_irqrestore(lock, flags);
+}
+
 int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
 			       struct hfi1_ctxtdata *uctxt)
 {
@@ -281,6 +298,7 @@ int hfi1_user_sdma_free_queues(struct hf
 		kfree(pq->reqs);
 		kfree(pq->req_in_use);
 		kmem_cache_destroy(pq->txreq_cache);
+		flush_pq_iowait(pq);
 		kfree(pq);
 	} else {
 		spin_unlock(&fd->pq_rcu_lock);
@@ -587,11 +605,12 @@ int hfi1_user_sdma_process_request(struc
 		if (ret < 0) {
 			if (ret != -EBUSY)
 				goto free_req;
-			wait_event_interruptible_timeout(
+			if (wait_event_interruptible_timeout(
 				pq->busy.wait_dma,
-				(pq->state == SDMA_PKT_Q_ACTIVE),
+				pq->state == SDMA_PKT_Q_ACTIVE,
 				msecs_to_jiffies(
-					SDMA_IOWAIT_TIMEOUT));
+					SDMA_IOWAIT_TIMEOUT)) <= 0)
+				flush_pq_iowait(pq);
 		}
 	}
 	*count += idx;



  parent reply	other threads:[~2020-04-07 10:23 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 10:21 [PATCH 5.4 00/36] 5.4.31-rc1 review Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 01/36] nvme-rdma: Avoid double freeing of async event data Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 02/36] kconfig: introduce m32-flag and m64-flag Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 03/36] drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017 Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 04/36] drm/bochs: downgrade pci_request_region failure from error to warning Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 05/36] initramfs: restore default compression behavior Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 06/36] drm/amdgpu: fix typo for vcn1 idle check Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 07/36] tools/power turbostat: Fix gcc build warnings Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 08/36] tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 09/36] tools/power turbostat: Fix 32-bit capabilities warning Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 10/36] bpf: Fix tnum constraints for 32-bit comparisons Greg Kroah-Hartman
2020-04-07 10:45   ` Daniel Borkmann
2020-04-07 14:42     ` Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 11/36] net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 12/36] XArray: Fix xa_find_next for large multi-index entries Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 13/36] padata: fix uninitialized return value in padata_replace() Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 14/36] brcmfmac: abort and release host after error Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 15/36] misc: rtsx: set correct pcr_ops for rts522A Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 16/36] misc: pci_endpoint_test: Fix to support > 10 pci-endpoint-test devices Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 17/36] misc: pci_endpoint_test: Avoid using module parameter to determine irqtype Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 18/36] PCI: sysfs: Revert "rescan" file renames Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 19/36] coresight: do not use the BIT() macro in the UAPI header Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 20/36] mei: me: add cedar fork device ids Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 21/36] nvmem: check for NULL reg_read and reg_write before dereferencing Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 22/36] extcon: axp288: Add wakeup support Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 23/36] power: supply: axp288_charger: Add special handling for HP Pavilion x2 10 Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 24/36] Revert "dm: always call blk_queue_split() in dm_process_bio()" Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 25/36] ALSA: hda/ca0132 - Add Recon3Di quirk to handle integrated sound on EVGA X99 Classified motherboard Greg Kroah-Hartman
2020-04-07 10:21 ` [PATCH 5.4 26/36] soc: mediatek: knows_txdone needs to be set in Mediatek CMDQ helper Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 27/36] net/mlx5e: kTLS, Fix wrong value in record tracker enum Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 28/36] iwlwifi: consider HE capability when setting LDPC Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 29/36] iwlwifi: yoyo: dont add TLV offset when reading FIFOs Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 30/36] iwlwifi: dbg: dont abort if sending DBGC_SUSPEND_RESUME fails Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 31/36] rxrpc: Fix sendmsg(MSG_WAITALL) handling Greg Kroah-Hartman
2020-04-07 10:22 ` Greg Kroah-Hartman [this message]
2020-04-07 10:22 ` [PATCH 5.4 33/36] tcp: fix TFO SYNACK undo to avoid double-timestamp-undo Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 34/36] i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 35/36] net: Fix Tx hash bound checking Greg Kroah-Hartman
2020-04-07 10:22 ` [PATCH 5.4 36/36] padata: always acquire cpu_hotplug_lock before pinst->lock Greg Kroah-Hartman
2020-04-07 12:37 ` [PATCH 5.4 00/36] 5.4.31-rc1 review Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200407101458.345069338@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dennis.dalessandro@intel.com \
    --cc=jgg@mellanox.com \
    --cc=kaike.wan@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).