public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.1 01/29] scsi: core: alua: I/O errors for ALUA state transitions
@ 2024-06-18 12:39 Sasha Levin
  2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 02/29] scsi: sr: Fix unintentional arithmetic wraparound Sasha Levin
                   ` (27 more replies)
  0 siblings, 28 replies; 34+ messages in thread
From: Sasha Levin @ 2024-06-18 12:39 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Martin Wilck, Rajashekhar M A, Hannes Reinecke, Damien Le Moal,
	Christoph Hellwig, Mike Christie, Martin K . Petersen,
	Sasha Levin, James.Bottomley, linux-scsi

From: Martin Wilck <martin.wilck@suse.com>

[ Upstream commit 10157b1fc1a762293381e9145041253420dfc6ad ]

When a host is configured with a few LUNs and I/O is running, injecting FC
faults repeatedly leads to path recovery problems.  The LUNs have 4 paths
each and 3 of them come back active after say an FC fault which makes 2 of
the paths go down, instead of all 4. This happens after several iterations
of continuous FC faults.

Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC
ACCESS STATE TRANSITION) instead of retrying.

[mwilck: The original patch was developed by Rajashekhar M A and Hannes
Reinecke. I moved the code to alua_check_sense() as suggested by Mike
Christie [1]. Evan Milne had raised the question whether pg->state should
be set to transitioning in the UA case [2]. I believe that doing this is
correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
errors. Our handler schedules an RTPG, which will only result in an I/O
error condition if the transitioning timeout expires.]

[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/

Co-developed-by: Rajashekhar M A <rajs@netapp.com>
Co-developed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 31 +++++++++++++++-------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 0781f991e7845..f5fc8631883d5 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -406,28 +406,40 @@ static char print_alua_state(unsigned char state)
 	}
 }
 
-static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
-					      struct scsi_sense_hdr *sense_hdr)
+static void alua_handle_state_transition(struct scsi_device *sdev)
 {
 	struct alua_dh_data *h = sdev->handler_data;
 	struct alua_port_group *pg;
 
+	rcu_read_lock();
+	pg = rcu_dereference(h->pg);
+	if (pg)
+		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+	rcu_read_unlock();
+	alua_check(sdev, false);
+}
+
+static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
+					      struct scsi_sense_hdr *sense_hdr)
+{
 	switch (sense_hdr->sense_key) {
 	case NOT_READY:
 		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
 			/*
 			 * LUN Not Accessible - ALUA state transition
 			 */
-			rcu_read_lock();
-			pg = rcu_dereference(h->pg);
-			if (pg)
-				pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
-			rcu_read_unlock();
-			alua_check(sdev, false);
+			alua_handle_state_transition(sdev);
 			return NEEDS_RETRY;
 		}
 		break;
 	case UNIT_ATTENTION:
+		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
+			/*
+			 * LUN Not Accessible - ALUA state transition
+			 */
+			alua_handle_state_transition(sdev);
+			return NEEDS_RETRY;
+		}
 		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
 			/*
 			 * Power On, Reset, or Bus Device Reset.
@@ -494,7 +506,8 @@ static int alua_tur(struct scsi_device *sdev)
 
 	retval = scsi_test_unit_ready(sdev, ALUA_FAILOVER_TIMEOUT * HZ,
 				      ALUA_FAILOVER_RETRIES, &sense_hdr);
-	if (sense_hdr.sense_key == NOT_READY &&
+	if ((sense_hdr.sense_key == NOT_READY ||
+	     sense_hdr.sense_key == UNIT_ATTENTION) &&
 	    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
 		return SCSI_DH_RETRY;
 	else if (retval)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread
* [PATCH AUTOSEL 6.1 01/29] scsi: core: alua: I/O errors for ALUA state transitions
@ 2024-06-17 13:24 Sasha Levin
  2024-06-17 13:24 ` [PATCH AUTOSEL 6.1 15/29] wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup Sasha Levin
  0 siblings, 1 reply; 34+ messages in thread
From: Sasha Levin @ 2024-06-17 13:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Martin Wilck, Rajashekhar M A, Hannes Reinecke, Damien Le Moal,
	Christoph Hellwig, Mike Christie, Martin K . Petersen,
	Sasha Levin, James.Bottomley, linux-scsi

From: Martin Wilck <martin.wilck@suse.com>

[ Upstream commit 10157b1fc1a762293381e9145041253420dfc6ad ]

When a host is configured with a few LUNs and I/O is running, injecting FC
faults repeatedly leads to path recovery problems.  The LUNs have 4 paths
each and 3 of them come back active after say an FC fault which makes 2 of
the paths go down, instead of all 4. This happens after several iterations
of continuous FC faults.

Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC
ACCESS STATE TRANSITION) instead of retrying.

[mwilck: The original patch was developed by Rajashekhar M A and Hannes
Reinecke. I moved the code to alua_check_sense() as suggested by Mike
Christie [1]. Evan Milne had raised the question whether pg->state should
be set to transitioning in the UA case [2]. I believe that doing this is
correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
errors. Our handler schedules an RTPG, which will only result in an I/O
error condition if the transitioning timeout expires.]

[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/

Co-developed-by: Rajashekhar M A <rajs@netapp.com>
Co-developed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <martin.wilck@suse.com>
Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 31 +++++++++++++++-------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 0781f991e7845..f5fc8631883d5 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -406,28 +406,40 @@ static char print_alua_state(unsigned char state)
 	}
 }
 
-static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
-					      struct scsi_sense_hdr *sense_hdr)
+static void alua_handle_state_transition(struct scsi_device *sdev)
 {
 	struct alua_dh_data *h = sdev->handler_data;
 	struct alua_port_group *pg;
 
+	rcu_read_lock();
+	pg = rcu_dereference(h->pg);
+	if (pg)
+		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+	rcu_read_unlock();
+	alua_check(sdev, false);
+}
+
+static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
+					      struct scsi_sense_hdr *sense_hdr)
+{
 	switch (sense_hdr->sense_key) {
 	case NOT_READY:
 		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
 			/*
 			 * LUN Not Accessible - ALUA state transition
 			 */
-			rcu_read_lock();
-			pg = rcu_dereference(h->pg);
-			if (pg)
-				pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
-			rcu_read_unlock();
-			alua_check(sdev, false);
+			alua_handle_state_transition(sdev);
 			return NEEDS_RETRY;
 		}
 		break;
 	case UNIT_ATTENTION:
+		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
+			/*
+			 * LUN Not Accessible - ALUA state transition
+			 */
+			alua_handle_state_transition(sdev);
+			return NEEDS_RETRY;
+		}
 		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
 			/*
 			 * Power On, Reset, or Bus Device Reset.
@@ -494,7 +506,8 @@ static int alua_tur(struct scsi_device *sdev)
 
 	retval = scsi_test_unit_ready(sdev, ALUA_FAILOVER_TIMEOUT * HZ,
 				      ALUA_FAILOVER_RETRIES, &sense_hdr);
-	if (sense_hdr.sense_key == NOT_READY &&
+	if ((sense_hdr.sense_key == NOT_READY ||
+	     sense_hdr.sense_key == UNIT_ATTENTION) &&
 	    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
 		return SCSI_DH_RETRY;
 	else if (retval)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-07-10 11:55 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18 12:39 [PATCH AUTOSEL 6.1 01/29] scsi: core: alua: I/O errors for ALUA state transitions Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 02/29] scsi: sr: Fix unintentional arithmetic wraparound Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 03/29] scsi: qedf: Don't process stag work during unload and recovery Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 04/29] scsi: qedf: Wait for stag work during unload Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 05/29] scsi: qedf: Set qed_slowpath_params to zero before use Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 06/29] efi/libstub: zboot.lds: Discard .discard sections Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 07/29] efi: pstore: Return proper errors on UEFI failures Sasha Levin
2024-07-10  9:59   ` Pavel Machek
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 08/29] ACPI: EC: Abort address space access upon error Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 09/29] ACPI: EC: Avoid returning AE_OK on errors in address space handler Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 10/29] tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 11/29] wifi: mac80211: mesh: init nonpeer_pm to active by default in mesh sdata Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 12/29] wifi: mac80211: apply mcast rate only if interface is up Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 13/29] wifi: mac80211: handle tasklet frames before stopping Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 14/29] wifi: cfg80211: fix 6 GHz scan request building Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 15/29] wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 16/29] wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 17/29] wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 18/29] wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 19/29] wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan() Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 20/29] selftests/openat2: Fix build warnings on ppc64 Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 21/29] selftests/futex: pass _GNU_SOURCE without a value to the compiler Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 22/29] of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw() Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 23/29] Input: silead - Always support 10 fingers Sasha Levin
2024-07-10  9:58   ` Pavel Machek
2024-07-10  9:59     ` Hans de Goede
2024-07-10 11:55       ` Pavel Machek
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 24/29] net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input() Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 25/29] ila: block BH in ila_output() Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 26/29] null_blk: fix validation of block size Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 27/29] kconfig: gconf: give a proper initial state to the Save button Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 28/29] kconfig: remove wrong expr_trans_bool() Sasha Levin
2024-06-18 12:39 ` [PATCH AUTOSEL 6.1 29/29] HID: Ignore battery for ELAN touchscreens 2F2C and 4116 Sasha Levin
  -- strict thread matches above, loose matches on Subject: below --
2024-06-17 13:24 [PATCH AUTOSEL 6.1 01/29] scsi: core: alua: I/O errors for ALUA state transitions Sasha Levin
2024-06-17 13:24 ` [PATCH AUTOSEL 6.1 15/29] wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox