stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Bill Kuzeja <william.kuzeja@stratus.com>,
	Himanshu Madhani <himanshu.madhani@cavium.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: [PATCH 4.4 22/38] scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
Date: Thu, 17 Nov 2016 11:33:00 +0100	[thread overview]
Message-ID: <20161117103237.449695865@linuxfoundation.org> (raw)
In-Reply-To: <20161117103236.423602981@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Bill Kuzeja <William.Kuzeja@stratus.com>

commit a5dd506e1584e91f3e7500ab9a165aa1b49eabd4 upstream.

A system can get hung task timeouts if a qlogic board fails during
initialization (if the board breaks again or fails the init). The hang
involves the scsi scan.

In a nutshell, since commit beb9e315e6e0 ("qla2xxx: Prevent removal and
board_disable race"):

...it is possible to have freed ha (base_vha->hw) early by a call to
qla2x00_remove_one when pdev->enable_cnt equals zero:

       if (!atomic_read(&pdev->enable_cnt)) {
               scsi_host_put(base_vha->host);
               kfree(ha);
               pci_set_drvdata(pdev, NULL);
               return;

Almost always, the scsi_host_put above frees the vha structure
(attached to the end of the Scsi_Host we're putting) since it's the last
put, and life is good.  However, if we are entering this routine because
the adapter has broken sometime during initialization AND a scsi scan is
already in progress (and has done its own scsi_host_get), vha will not
be freed. What's worse, the scsi scan will access the freed ha structure
through qla2xxx_scan_finished:

        if (time > vha->hw->loop_reset_delay * HZ)
                return 1;

The scsi scan keeps checking to see if a scan is complete by calling
qla2xxx_scan_finished. There is a timeout value that limits the length
of time a scan can take (hw->loop_reset_delay, usually set to 5
seconds), but this definition is in the data structure (hw) that can get
freed early.

This can yield unpredictable results, the worst of which is that the
scsi scan can hang indefinitely. This happens when the freed structure
gets reused and loop_reset_delay gets overwritten with garbage, which
the scan obliviously uses as its timeout value.

The fix for this is simple: at the top of qla2xxx_scan_finished, check
for the UNLOADING bit in the vha structure (_vha is not freed at this
point).  If UNLOADING is set, we exit the scan for this adapter
immediately. After this last reference to the ha structure, we'll exit
the scan for this adapter, and continue on.

This problem is hard to hit, but I have run into it doing negative
testing many times now (with a test specifically designed to bring it
out), so I can verify that this fix works. My testing has been against a
RHEL7 driver variant, but the bug and patch are equally relevant to to
the upstream driver.

Fixes: beb9e315e6e0 ("qla2xxx: Prevent removal and board_disable race")
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/scsi/qla2xxx/qla_os.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -2257,6 +2257,8 @@ qla2xxx_scan_finished(struct Scsi_Host *
 {
 	scsi_qla_host_t *vha = shost_priv(shost);
 
+	if (test_bit(UNLOADING, &vha->dpc_flags))
+		return 1;
 	if (!vha->host)
 		return 1;
 	if (time > vha->hw->loop_reset_delay * HZ)

  parent reply	other threads:[~2016-11-17 10:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-17 10:32 [PATCH 4.4 00/38] 4.4.33-stable review Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 01/38] ALSA: info: Return error for invalid read/write Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 02/38] ALSA: info: Limit the proc text input size Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 03/38] ASoC: cs4270: fix DAPM stream name mismatch Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 05/38] swapfile: fix memory corruption via malformed swapfile Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 06/38] coredump: fix unfreezable coredumping task Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 07/38] s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 08/38] ARC: timer: rtc: implement read loop in "C" vs. inline asm Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 09/38] pinctrl: cherryview: Serialize register access in suspend/resume Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 10/38] pinctrl: cherryview: Prevent possible interrupt storm on resume Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 11/38] staging: iio: ad5933: avoid uninitialized variable in error case Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 12/38] drivers: staging: nvec: remove bogus reset command for PS/2 interface Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 13/38] Revert "staging: nvec: ps2: change serio type to passthrough" Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 14/38] staging: nvec: remove managed resource from PS2 driver Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 15/38] USB: cdc-acm: fix TIOCMIWAIT Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 17/38] drbd: Fix kernel_sendmsg() usage - potential NULL deref Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 18/38] toshiba-wmi: Fix loading the driver on non Toshiba laptops Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 19/38] clk: qoriq: Dont allow CPU clocks higher than starting value Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 20/38] iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.4 21/38] iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver) Greg Kroah-Hartman
2016-11-17 10:33 ` Greg Kroah-Hartman [this message]
2016-11-17 10:33 ` [PATCH 4.4 23/38] scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 26/38] dmaengine: at_xdmac: fix spurious flag status for mem2mem transfers Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 28/38] iommu/amd: Free domain id when free a domain of struct dma_ops_domain Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 29/38] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 30/38] mei: bus: fix received data size check in NFC fixup Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 31/38] lib/genalloc.c: start search from start of chunk Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 32/38] hwrng: core - Dont use a stack buffer in add_early_randomness() Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 33/38] i40e: fix call of ndo_dflt_bridge_getlink() Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 34/38] ACPI / APEI: Fix incorrect return value of ghes_proc() Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 35/38] ASoC: sun4i-codec: return error code instead of NULL when create_card fails Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 36/38] mmc: mxs: Initialize the spinlock prior to using it Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 37/38] btrfs: qgroup: Prevent qgroup->reserved from going subzero Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.4 38/38] netfilter: fix namespace handling in nf_log_proc_dostring Greg Kroah-Hartman
2016-11-17 11:03   ` Pablo Neira Ayuso
2016-11-17 12:01     ` Greg Kroah-Hartman
2016-11-17 22:22 ` [PATCH 4.4 00/38] 4.4.33-stable review Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161117103237.449695865@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=himanshu.madhani@cavium.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=william.kuzeja@stratus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).