stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Brian King <brking@linux.vnet.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 13/17] scsi: ibmvfc: Set default timeout to avoid crash during migration
Date: Fri,  5 Feb 2021 15:08:07 +0100	[thread overview]
Message-ID: <20210205140650.348317315@linuxfoundation.org> (raw)
In-Reply-To: <20210205140649.825180779@linuxfoundation.org>

From: Brian King <brking@linux.vnet.ibm.com>

[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 090ab377f65e5..50078a199fea0 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -2890,8 +2890,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev)
 	unsigned long flags = 0;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (sdev->type == TYPE_DISK)
+	if (sdev->type == TYPE_DISK) {
 		sdev->allow_restart = 1;
+		blk_queue_rq_timeout(sdev->request_queue, 120 * HZ);
+	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return 0;
 }
-- 
2.27.0




  parent reply	other threads:[~2021-02-05 16:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-05 14:07 [PATCH 4.19 00/17] 4.19.174-rc1 review Greg Kroah-Hartman
2021-02-05 14:07 ` [PATCH 4.19 01/17] net: dsa: bcm_sf2: put device node before return Greg Kroah-Hartman
2021-02-05 14:07 ` [PATCH 4.19 02/17] ibmvnic: Ensure that CRQ entry read are correctly ordered Greg Kroah-Hartman
2021-02-05 14:07 ` [PATCH 4.19 03/17] ACPI: thermal: Do not call acpi_thermal_check() directly Greg Kroah-Hartman
2021-02-05 14:07 ` [PATCH 4.19 04/17] sysctl: handle overflow in proc_get_long Greg Kroah-Hartman
2021-02-05 14:07 ` [PATCH 4.19 05/17] net_sched: gen_estimator: support large ewma log Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 06/17] phy: cpcap-usb: Fix warning for missing regulator_disable Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 07/17] platform/x86: touchscreen_dmi: Add swap-x-y quirk for Goodix touchscreen on Estar Beauty HD tablet Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 08/17] platform/x86: intel-vbtn: Support for tablet mode on Dell Inspiron 7352 Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 09/17] x86: __always_inline __{rd,wr}msr() Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 10/17] scsi: scsi_transport_srp: Dont block target in failfast state Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 11/17] scsi: libfc: Avoid invoking response handler twice if ep is already completed Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 12/17] mac80211: fix fast-rx encryption check Greg Kroah-Hartman
2021-02-05 14:08 ` Greg Kroah-Hartman [this message]
2021-02-05 14:08 ` [PATCH 4.19 14/17] selftests/powerpc: Only test lwm/stmw on big endian Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 15/17] objtool: Dont fail on missing symbol table Greg Kroah-Hartman
2021-02-05 14:08 ` [PATCH 4.19 16/17] kthread: Extract KTHREAD_IS_PER_CPU Greg Kroah-Hartman
2021-02-06 23:25   ` Pavel Machek
2021-02-05 14:08 ` [PATCH 4.19 17/17] workqueue: Restrict affinity change to rescuer Greg Kroah-Hartman
2021-02-05 22:54 ` [PATCH 4.19 00/17] 4.19.174-rc1 review Pavel Machek
2021-02-06 14:43 ` Naresh Kamboju
2021-02-06 16:01 ` Guenter Roeck
2021-02-07 13:25 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210205140650.348317315@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=brking@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).