From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>,
Kaike Wan <kaike.wan@intel.com>,
Dennis Dalessandro <dennis.dalessandro@intel.com>,
Jason Gunthorpe <jgg@mellanox.com>,
Sasha Levin <sashal@kernel.org>,
linux-rdma@vger.kernel.org
Subject: [PATCH AUTOSEL 5.5 26/35] IB/hfi1: Ensure pq is not left on waitlist
Date: Mon, 6 Apr 2020 20:00:48 -0400 [thread overview]
Message-ID: <20200407000058.16423-26-sashal@kernel.org> (raw)
In-Reply-To: <20200407000058.16423-1-sashal@kernel.org>
From: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Upstream commit 9a293d1e21a6461a11b4217b155bf445e57f4131 ]
The following warning can occur when a pq is left on the dmawait list and
the pq is then freed:
WARNING: CPU: 47 PID: 3546 at lib/list_debug.c:29 __list_add+0x65/0xc0
list_add corruption. next->prev should be prev (ffff939228da1880), but was ffff939cabb52230. (next=ffff939cabb52230).
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev drm_panel_orientation_quirks i2c_i801 mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci libahci i2c_algo_bit dca libata ptp pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 47 PID: 3546 Comm: wrf.exe Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.41.1.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
Call Trace:
[<ffffffff91f65ac0>] dump_stack+0x19/0x1b
[<ffffffff91898b78>] __warn+0xd8/0x100
[<ffffffff91898bff>] warn_slowpath_fmt+0x5f/0x80
[<ffffffff91a1dabe>] ? ___slab_alloc+0x24e/0x4f0
[<ffffffff91b97025>] __list_add+0x65/0xc0
[<ffffffffc03926a5>] defer_packet_queue+0x145/0x1a0 [hfi1]
[<ffffffffc0372987>] sdma_check_progress+0x67/0xa0 [hfi1]
[<ffffffffc03779d2>] sdma_send_txlist+0x432/0x550 [hfi1]
[<ffffffff91a20009>] ? kmem_cache_alloc+0x179/0x1f0
[<ffffffffc0392973>] ? user_sdma_send_pkts+0xc3/0x1990 [hfi1]
[<ffffffffc0393e3a>] user_sdma_send_pkts+0x158a/0x1990 [hfi1]
[<ffffffff918ab65e>] ? try_to_del_timer_sync+0x5e/0x90
[<ffffffff91a3fe1a>] ? __check_object_size+0x1ca/0x250
[<ffffffffc0395546>] hfi1_user_sdma_process_request+0xd66/0x1280 [hfi1]
[<ffffffffc034e0da>] hfi1_aio_write+0xca/0x120 [hfi1]
[<ffffffff91a4245b>] do_sync_readv_writev+0x7b/0xd0
[<ffffffff91a4409e>] do_readv_writev+0xce/0x260
[<ffffffff918df69f>] ? pick_next_task_fair+0x5f/0x1b0
[<ffffffff918db535>] ? sched_clock_cpu+0x85/0xc0
[<ffffffff91f6b16a>] ? __schedule+0x13a/0x860
[<ffffffff91a442c5>] vfs_writev+0x35/0x60
[<ffffffff91a4447f>] SyS_writev+0x7f/0x110
[<ffffffff91f78ddb>] system_call_fastpath+0x22/0x27
The issue happens when wait_event_interruptible_timeout() returns a value
<= 0.
In that case, the pq is left on the list. The code continues sending
packets and potentially can complete the current request with the pq still
on the dmawait list provided no descriptor shortage is seen.
If the pq is torn down in that state, the sdma interrupt handler could
find the now freed pq on the list with list corruption or memory
corruption resulting.
Fix by adding a flush routine to ensure that the pq is never on a list
after processing a request.
A follow-up patch series will address issues with seqlock surfaced in:
https://lore.kernel.org/r/20200320003129.GP20941@ziepe.ca
The seqlock use for sdma will then be converted to a spin lock since the
list_empty() doesn't need the protection afforded by the sequence lock
currently in use.
Fixes: a0d406934a46 ("staging/rdma/hfi1: Add page lock limit check for SDMA requests")
Link: https://lore.kernel.org/r/20200320200200.23203.37777.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/infiniband/hw/hfi1/user_sdma.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index c2f0d9ba93de1..13e4203497b33 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -141,6 +141,7 @@ static int defer_packet_queue(
*/
xchg(&pq->state, SDMA_PKT_Q_DEFERRED);
if (list_empty(&pq->busy.list)) {
+ pq->busy.lock = &sde->waitlock;
iowait_get_priority(&pq->busy);
iowait_queue(pkts_sent, &pq->busy, &sde->dmawait);
}
@@ -155,6 +156,7 @@ static void activate_packet_queue(struct iowait *wait, int reason)
{
struct hfi1_user_sdma_pkt_q *pq =
container_of(wait, struct hfi1_user_sdma_pkt_q, busy);
+ pq->busy.lock = NULL;
xchg(&pq->state, SDMA_PKT_Q_ACTIVE);
wake_up(&wait->wait_dma);
};
@@ -256,6 +258,21 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
return ret;
}
+static void flush_pq_iowait(struct hfi1_user_sdma_pkt_q *pq)
+{
+ unsigned long flags;
+ seqlock_t *lock = pq->busy.lock;
+
+ if (!lock)
+ return;
+ write_seqlock_irqsave(lock, flags);
+ if (!list_empty(&pq->busy.list)) {
+ list_del_init(&pq->busy.list);
+ pq->busy.lock = NULL;
+ }
+ write_sequnlock_irqrestore(lock, flags);
+}
+
int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
struct hfi1_ctxtdata *uctxt)
{
@@ -281,6 +298,7 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
kfree(pq->reqs);
kfree(pq->req_in_use);
kmem_cache_destroy(pq->txreq_cache);
+ flush_pq_iowait(pq);
kfree(pq);
} else {
spin_unlock(&fd->pq_rcu_lock);
@@ -587,11 +605,12 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
if (ret < 0) {
if (ret != -EBUSY)
goto free_req;
- wait_event_interruptible_timeout(
+ if (wait_event_interruptible_timeout(
pq->busy.wait_dma,
- (pq->state == SDMA_PKT_Q_ACTIVE),
+ pq->state == SDMA_PKT_Q_ACTIVE,
msecs_to_jiffies(
- SDMA_IOWAIT_TIMEOUT));
+ SDMA_IOWAIT_TIMEOUT)) <= 0)
+ flush_pq_iowait(pq);
}
}
*count += idx;
--
2.20.1
next prev parent reply other threads:[~2020-04-07 0:08 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-07 0:00 [PATCH AUTOSEL 5.5 01/35] ARM: dts: sun8i-a83t-tbs-a711: HM5065 doesn't like such a high voltage Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 02/35] bus: sunxi-rsb: Return correct data when mixing 16-bit and 8-bit reads Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 03/35] ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 04/35] bpf: Fix deadlock with rq_lock in bpf_send_signal() Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 05/35] net/mlx5e: kTLS, Fix wrong value in record tracker enum Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 06/35] iwlwifi: mvm: take the required lock when clearing time event data Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 07/35] iwlwifi: consider HE capability when setting LDPC Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 08/35] iwlwifi: yoyo: don't add TLV offset when reading FIFOs Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 09/35] iwlwifi: dbg: don't abort if sending DBGC_SUSPEND_RESUME fails Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 10/35] iwlwifi: mvm: Fix rate scale NSS configuration Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 11/35] Input: tm2-touchkey - add support for Coreriver TC360 variant Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 12/35] soc: fsl: dpio: register dpio irq handlers after dpio create Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 13/35] rxrpc: Abstract out the calculation of whether there's Tx space Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 14/35] rxrpc: Fix call interruptibility handling Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 15/35] rxrpc: Fix sendmsg(MSG_WAITALL) handling Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 16/35] net: stmmac: platform: Fix misleading interrupt error msg Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 17/35] net: vxge: fix wrong __VA_ARGS__ usage Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 18/35] ARM: dts: omap4-droid4: Fix lost touchscreen interrupts Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 19/35] riscv: uaccess should be used in nommu mode Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 20/35] hinic: fix a bug of waitting for IO stopped Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 21/35] hinic: fix the bug of clearing event queue Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 22/35] hinic: fix out-of-order excution in arm cpu Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 23/35] hinic: fix wrong para of wait_for_completion_timeout Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 24/35] hinic: fix wrong value of MIN_SKB_LEN Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 25/35] selftests/net: add definition for SOL_DCCP to fix compilation errors for old libc Sasha Levin
2020-04-07 0:00 ` Sasha Levin [this message]
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 27/35] netfilter: nf_tables: Allow set back-ends to report partial overlaps on insertion Sasha Levin
2020-04-07 0:18 ` Stefano Brivio
2020-04-13 16:39 ` Sasha Levin
2020-04-13 20:38 ` Stefano Brivio
2020-04-14 15:08 ` Sasha Levin
2020-04-21 11:32 ` Pablo Neira Ayuso
2020-04-21 13:14 ` Greg KH
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 28/35] netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 29/35] netfilter: nft_set_rbtree: Detect partial overlaps on insertion Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 30/35] cxgb4/ptp: pass the sign of offset delta in FW CMD Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 31/35] drm/scheduler: fix rare NULL ptr race Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 32/35] cfg80211: Do not warn on same channel at the end of CSA Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 33/35] qlcnic: Fix bad kzalloc null test Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 34/35] i2c: st: fix missing struct parameter description Sasha Levin
2020-04-07 0:00 ` [PATCH AUTOSEL 5.5 35/35] i2c: pca-platform: Use platform_irq_get_optional Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200407000058.16423-26-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=dennis.dalessandro@intel.com \
--cc=jgg@mellanox.com \
--cc=kaike.wan@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mike.marciniszyn@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).