* [PATCH 00/13] scsi: Support LUN/target based error handle
@ 2023-07-23 23:44 Wenchao Hao
2023-07-23 23:44 ` [PATCH 01/13] scsi: Define basic framework for driver " Wenchao Hao
` (16 more replies)
0 siblings, 17 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
The origin error handle would set host to recovery state and perform
error recovery operations, and makes all LUNs which share a same host
can not handle IOs. This phenomenon is unbearable for systems which
deploy many LUNs in one HBA.
This patchset introduce support for LUN/target based error handle,
drivers can chose if to implement it. They can implement LUN, target or
both of LUN and target based error handle by their own error handle
strategy. The first patch defined this framework, it abstract three
key operations which are: add error command, wake up error handle, block
ios when error command is added and recoverying. Drivers should
implement these three function callbacks and setup to SCSI middle level.
Besides the basic framework, this patchset also add a basic LUN/target
based error handle strategy.
For LUN based eh, it would try check sense, start unit and reset LUN,
if all above steps can not recovery all error commands, fallback to
further recovery like tartget based (if implemented) or host based error
handle.
It's same for tartget based eh, it would try check sense, start unit,
reset LUN and reset target. If all above steps can not recovery all error
commands, fallback to further recovery which is host based error handle.
This patchset is tested by scsi_debug which support single LUN error
injection, the scsi_debug patches is here:
https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
Wenchao Hao (13):
scsi: Define basic framework for driver LUN/target based error handle
scsi:scsi_error: Move complete variable eh_action from shost to sdevice
scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
scsi:scsi_error: Add flags to mark error handle steps has done
scsi:scsi_error: Define helper to perform LUN based error handle
scsi:scsi_error: Add LUN based error handler based previous helper
scsi:core: increase/decrease target_busy without check can_queue
scsi:scsi_error: Define helper to perform target based error handle
scsi:scsi_error: Add target based error handler based previous helper
scsi:scsi_debug: Add param to control if setup LUN based error handle
scsi:scsi_debug: Add param to control if setup target based error handle
drivers/scsi/scsi_debug.c | 19 +
drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++---
drivers/scsi/scsi_lib.c | 23 +-
drivers/scsi/scsi_priv.h | 20 ++
include/scsi/scsi_device.h | 97 +++++
include/scsi/scsi_eh.h | 4 +
include/scsi/scsi_host.h | 2 -
7 files changed, 813 insertions(+), 57 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 01/13] scsi: Define basic framework for driver LUN/target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 02/13] scsi:scsi_error: Move complete variable eh_action from shost to sdevice Wenchao Hao
` (15 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
SCSI mid-level is response to handle error scsi command, the traditional
handle logic based on host, once a scsi command in one LUN of this host
failed, SCSI mid-level would set the whole host to recovery state, and
no IO can be submitted to LUNs of this host any more before recovery
finished, while the recovery process might take a long time to finish.
It's unreasonable when there are a lot of LUNs in one host.
So this change introduce a way for driver to implement its own
error handle logic which can base on a single scsi LUN oe scsi target
as minimum unit.
scsi_device_eh is defined for error handle based on scsi LUN, and pointer
struct scsi_device_eh "eh" is added in scsi_device, which is NULL by
default.
LLDs can initialize the sdev->eh in hostt->slave_alloc to implement a
scsi LUN based error handle. If this member is not NULL, SCSI mid-level
would branch to drivers' error handler rather than the traditional one.
scsi_target_eh is defined for error handle based on scsi target, and
pointer struct scsi_target_eh "eh" is added in scsi_target, which is NULL
by default.
LLDs can initialize the sdev->eh in hostt->target_alloc to implement a
scsi target based error handle. If this member is not NULL, SCSI
mid-level would branch to drivers' error handler rather than the
traditional one.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 57 +++++++++++++++++++++++++++++++-
drivers/scsi/scsi_lib.c | 12 +++++++
drivers/scsi/scsi_priv.h | 16 +++++++++
include/scsi/scsi_device.h | 67 ++++++++++++++++++++++++++++++++++++++
4 files changed, 151 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..1d1d97b94613 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -290,11 +290,48 @@ static void scsi_eh_inc_host_failed(struct rcu_head *head)
spin_unlock_irqrestore(shost->host_lock, flags);
}
+#define SCSI_EH_NO_HANDLER 1
+
+static int __scsi_eh_scmd_add_sdev(struct scsi_cmnd *scmd)
+{
+ struct scsi_device *sdev = scmd->device;
+ struct scsi_device_eh *eh = sdev->eh;
+
+ if (!eh || !eh->add_cmnd)
+ return SCSI_EH_NO_HANDLER;
+
+ scsi_eh_reset(scmd);
+ eh->add_cmnd(scmd);
+
+ if (eh->wakeup)
+ eh->wakeup(sdev);
+
+ return 0;
+}
+
+static int __scsi_eh_scmd_add_starget(struct scsi_cmnd *scmd)
+{
+ struct scsi_device *sdev = scmd->device;
+ struct scsi_target *starget = scsi_target(sdev);
+ struct scsi_target_eh *eh = starget->eh;
+
+ if (!eh || !eh->add_cmnd)
+ return SCSI_EH_NO_HANDLER;
+
+ scsi_eh_reset(scmd);
+ eh->add_cmnd(scmd);
+
+ if (eh->wakeup)
+ eh->wakeup(starget);
+
+ return 0;
+}
+
/**
* scsi_eh_scmd_add - add scsi cmd to error handling.
* @scmd: scmd to run eh on.
*/
-void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
+static void __scsi_eh_scmd_add(struct scsi_cmnd *scmd)
{
struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
@@ -320,6 +357,24 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
call_rcu_hurry(&scmd->rcu, scsi_eh_inc_host_failed);
}
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
+{
+ struct scsi_device *sdev = scmd->device;
+ struct scsi_target *starget = scsi_target(sdev);
+ struct Scsi_Host *shost = sdev->host;
+
+ if (unlikely(scsi_host_in_recovery(shost)))
+ __scsi_eh_scmd_add(scmd);
+
+ if (unlikely(scsi_target_in_recovery(starget)))
+ if (__scsi_eh_scmd_add_starget(scmd))
+ __scsi_eh_scmd_add(scmd);
+
+ if (__scsi_eh_scmd_add_sdev(scmd))
+ if (__scsi_eh_scmd_add_starget(scmd))
+ __scsi_eh_scmd_add(scmd);
+}
+
/**
* scsi_timeout - Timeout function for normal scsi commands.
* @req: request that is timing out.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ad9afae49544..db0a42fe49c0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -298,6 +298,12 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd)
sbitmap_put(&sdev->budget_map, cmd->budget_token);
cmd->budget_token = -1;
+
+ if (sdev->eh && sdev->eh->wakeup)
+ sdev->eh->wakeup(sdev);
+
+ if (starget->eh && starget->eh->wakeup)
+ starget->eh->wakeup(starget);
}
static void scsi_kick_queue(struct request_queue *q)
@@ -1253,6 +1259,9 @@ static inline int scsi_dev_queue_ready(struct request_queue *q,
{
int token;
+ if (scsi_device_in_recovery(sdev))
+ return -1;
+
token = sbitmap_get(&sdev->budget_map);
if (atomic_read(&sdev->device_blocked)) {
if (token < 0)
@@ -1288,6 +1297,9 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
struct scsi_target *starget = scsi_target(sdev);
unsigned int busy;
+ if (scsi_target_in_recovery(starget))
+ return 0;
+
if (starget->single_lun) {
spin_lock_irq(shost->host_lock);
if (starget->starget_sdev_user &&
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f42388ecb024..484c2f61ffe7 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -196,6 +196,22 @@ static inline void scsi_dh_add_device(struct scsi_device *sdev) { }
static inline void scsi_dh_release_device(struct scsi_device *sdev) { }
#endif
+static inline int scsi_device_in_recovery(struct scsi_device *sdev)
+{
+ struct scsi_device_eh *eh = sdev->eh;
+ if (eh && eh->is_busy)
+ return eh->is_busy(sdev);
+ return 0;
+}
+
+static inline int scsi_target_in_recovery(struct scsi_target *starget)
+{
+ struct scsi_target_eh *eh = starget->eh;
+ if (eh && eh->is_busy)
+ return eh->is_busy(starget);
+ return 0;
+}
+
struct bsg_device *scsi_bsg_register_queue(struct scsi_device *sdev);
extern int scsi_device_max_queue_depth(struct scsi_device *sdev);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 75b2235b99e2..08ed9a03015d 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -104,6 +104,71 @@ enum scsi_vpd_parameters {
SCSI_VPD_HEADER_SIZE = 4,
};
+struct scsi_device;
+struct scsi_target;
+
+struct scsi_device_eh {
+ /*
+ * add scsi command to error handler so it would be handuled by
+ * driver's error handle strategy
+ */
+ void (*add_cmnd)(struct scsi_cmnd *scmd);
+
+ /*
+ * to judge if the device is busy handling errors, called before
+ * dispatch scsi cmnd
+ *
+ * return 0 if it's ready to accepy scsi cmnd
+ * return 0 if it's in error handle, command's would not be dispatched
+ */
+ int (*is_busy)(struct scsi_device *sdev);
+
+ /*
+ * wakeup device's error handle
+ *
+ * usually the error handler strategy would not run at once when
+ * error command is added. This function would be called when any
+ * scsi cmnd is finished or when scsi cmnd is added.
+ */
+ int (*wakeup)(struct scsi_device *sdev);
+
+ /*
+ * data entity for device specific error handler
+ */
+ unsigned long driver_data[];
+};
+
+struct scsi_target_eh {
+ /*
+ * add scsi command to error handler so it would be handuled by
+ * driver's error handle strategy
+ */
+ void (*add_cmnd)(struct scsi_cmnd *scmd);
+
+ /*
+ * to judge if the device is busy handling errors, called before
+ * dispatch scsi cmnd
+ *
+ * return 0 if it's ready to accepy scsi cmnd
+ * return 0 if it's in error handle, command's would not be dispatched
+ */
+ int (*is_busy)(struct scsi_target *starget);
+
+ /*
+ * wakeup device's error handle
+ *
+ * usually the error handler strategy would not run at once when
+ * error command is added. This function would be called when any
+ * scsi cmnd is finished or when scsi cmnd is added.
+ */
+ int (*wakeup)(struct scsi_target *starget);
+
+ /*
+ * data entity for device specific error handler
+ */
+ unsigned long driver_data[];
+};
+
struct scsi_device {
struct Scsi_Host *host;
struct request_queue *request_queue;
@@ -258,6 +323,7 @@ struct scsi_device {
struct mutex state_mutex;
enum scsi_device_state sdev_state;
struct task_struct *quiesced_by;
+ struct scsi_device_eh *eh;
unsigned long sdev_data[];
} __attribute__((aligned(sizeof(unsigned long))));
@@ -344,6 +410,7 @@ struct scsi_target {
char scsi_level;
enum scsi_target_state state;
void *hostdata; /* available to low-level driver */
+ struct scsi_target_eh *eh;
unsigned long starget_data[]; /* for the transport */
/* starget_data must be the last element!!!! */
} __attribute__((aligned(sizeof(unsigned long))));
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 02/13] scsi:scsi_error: Move complete variable eh_action from shost to sdevice
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
2023-07-23 23:44 ` [PATCH 01/13] scsi: Define basic framework for driver " Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 03/13] scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset Wenchao Hao
` (14 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
eh_action is used to wait for error handle command's completion if
scsi command is send in error handle. Now the error handler might
based on scsi_device, so move it to scsi_device.
This is preparation for a genernal LUN/target based error handle
strategy.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 6 +++---
include/scsi/scsi_device.h | 2 ++
include/scsi/scsi_host.h | 2 --
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1d1d97b94613..879fdd7c165b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -914,7 +914,7 @@ void scsi_eh_done(struct scsi_cmnd *scmd)
SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd,
"%s result: %x\n", __func__, scmd->result));
- eh_action = scmd->device->host->eh_action;
+ eh_action = scmd->device->eh_action;
if (eh_action)
complete(eh_action);
}
@@ -1203,7 +1203,7 @@ static enum scsi_disposition scsi_send_eh_cmnd(struct scsi_cmnd *scmd,
retry:
scsi_eh_prep_cmnd(scmd, &ses, cmnd, cmnd_size, sense_bytes);
- shost->eh_action = &done;
+ sdev->eh_action = &done;
scsi_log_send(scmd);
scmd->submitter = SUBMITTED_BY_SCSI_ERROR_HANDLER;
@@ -1246,7 +1246,7 @@ static enum scsi_disposition scsi_send_eh_cmnd(struct scsi_cmnd *scmd,
rtn = SUCCESS;
}
- shost->eh_action = NULL;
+ sdev->eh_action = NULL;
scsi_log_completion(scmd, rtn);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 08ed9a03015d..1894ba1c82cd 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -324,6 +324,8 @@ struct scsi_device {
enum scsi_device_state sdev_state;
struct task_struct *quiesced_by;
struct scsi_device_eh *eh;
+ struct completion *eh_action; /* Wait for specific actions on the
+ device. */
unsigned long sdev_data[];
} __attribute__((aligned(sizeof(unsigned long))));
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 70b7475dcf56..def0d99e9b36 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -554,8 +554,6 @@ struct Scsi_Host {
struct list_head eh_abort_list;
struct list_head eh_cmd_q;
struct task_struct * ehandler; /* Error recovery thread. */
- struct completion * eh_action; /* Wait for specific actions on the
- host. */
wait_queue_head_t host_wait;
const struct scsi_host_template *hostt;
struct scsi_transport_template *transportt;
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 03/13] scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
2023-07-23 23:44 ` [PATCH 01/13] scsi: Define basic framework for driver " Wenchao Hao
2023-07-23 23:44 ` [PATCH 02/13] scsi:scsi_error: Move complete variable eh_action from shost to sdevice Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 04/13] scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT Wenchao Hao
` (13 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
This is preparation for a genernal LUN/target based error handle
strategy, the strategy would reuse some origin error handler APIs,
but some steps of these function should not be performed. For
example, we should not perform target reset if we just stop IOs
on one single LUN.
This change add checks in scsi_try_xxx_reset to make sure
the reset operations would not be performed only if the condition
is not satisfied.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 879fdd7c165b..d80492366527 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -930,6 +930,9 @@ static enum scsi_disposition scsi_try_host_reset(struct scsi_cmnd *scmd)
struct Scsi_Host *host = scmd->device->host;
const struct scsi_host_template *hostt = host->hostt;
+ if (!scsi_host_in_recovery(host))
+ return FAILED;
+
SCSI_LOG_ERROR_RECOVERY(3,
shost_printk(KERN_INFO, host, "Snd Host RST\n"));
@@ -960,6 +963,9 @@ static enum scsi_disposition scsi_try_bus_reset(struct scsi_cmnd *scmd)
struct Scsi_Host *host = scmd->device->host;
const struct scsi_host_template *hostt = host->hostt;
+ if (!scsi_host_in_recovery(host))
+ return FAILED;
+
SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd,
"%s: Snd Bus RST\n", __func__));
@@ -1001,6 +1007,10 @@ static enum scsi_disposition scsi_try_target_reset(struct scsi_cmnd *scmd)
enum scsi_disposition rtn;
struct Scsi_Host *host = scmd->device->host;
const struct scsi_host_template *hostt = host->hostt;
+ struct scsi_target *starget = scsi_target(scmd->device);
+
+ if (!(scsi_target_in_recovery(starget) || scsi_host_in_recovery(host)))
+ return FAILED;
if (!hostt->eh_target_reset_handler)
return FAILED;
@@ -1008,7 +1018,7 @@ static enum scsi_disposition scsi_try_target_reset(struct scsi_cmnd *scmd)
rtn = hostt->eh_target_reset_handler(scmd);
if (rtn == SUCCESS) {
spin_lock_irqsave(host->host_lock, flags);
- __starget_for_each_device(scsi_target(scmd->device), NULL,
+ __starget_for_each_device(starget, NULL,
__scsi_report_device_reset);
spin_unlock_irqrestore(host->host_lock, flags);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 04/13] scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (2 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 03/13] scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 05/13] scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset Wenchao Hao
` (12 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add helper function scsi_eh_sdev_stu() to perform START_UNIT and check
if to finish some error commands.
This is preparation for a genernal LUN/target based error handle
strategy and did not change original logic.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d80492366527..b7842d927af3 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1549,6 +1549,31 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd)
return 1;
}
+static int scsi_eh_sdev_stu(struct scsi_cmnd *scmd,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ struct scsi_device *sdev = scmd->device;
+ struct scsi_cmnd *next;
+
+ SCSI_LOG_ERROR_RECOVERY(3, sdev_printk(KERN_INFO, sdev,
+ "%s: Sending START_UNIT\n", current->comm));
+
+ if (scsi_eh_try_stu(scmd)) {
+ SCSI_LOG_ERROR_RECOVERY(3, sdev_printk(KERN_INFO, sdev,
+ "%s: START_UNIT failed\n", current->comm));
+ return 0;
+ }
+
+ if (!scsi_device_online(sdev) || !scsi_eh_tur(scmd))
+ list_for_each_entry_safe(scmd, next, work_q, eh_entry)
+ if (scmd->device == sdev &&
+ scsi_eh_action(scmd, SUCCESS) == SUCCESS)
+ scsi_eh_finish_cmd(scmd, done_q);
+
+ return list_empty(work_q);
+}
+
/**
* scsi_eh_stu - send START_UNIT if needed
* @shost: &scsi host being recovered.
@@ -1563,7 +1588,7 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
struct list_head *work_q,
struct list_head *done_q)
{
- struct scsi_cmnd *scmd, *stu_scmd, *next;
+ struct scsi_cmnd *scmd, *stu_scmd;
struct scsi_device *sdev;
shost_for_each_device(sdev, shost) {
@@ -1586,26 +1611,9 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
if (!stu_scmd)
continue;
- SCSI_LOG_ERROR_RECOVERY(3,
- sdev_printk(KERN_INFO, sdev,
- "%s: Sending START_UNIT\n",
- current->comm));
-
- if (!scsi_eh_try_stu(stu_scmd)) {
- if (!scsi_device_online(sdev) ||
- !scsi_eh_tur(stu_scmd)) {
- list_for_each_entry_safe(scmd, next,
- work_q, eh_entry) {
- if (scmd->device == sdev &&
- scsi_eh_action(scmd, SUCCESS) == SUCCESS)
- scsi_eh_finish_cmd(scmd, done_q);
- }
- }
- } else {
- SCSI_LOG_ERROR_RECOVERY(3,
- sdev_printk(KERN_INFO, sdev,
- "%s: START_UNIT failed\n",
- current->comm));
+ if (scsi_eh_sdev_stu(stu_scmd, work_q, done_q)) {
+ scsi_device_put(sdev);
+ break;
}
}
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 05/13] scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (3 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 04/13] scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 06/13] scsi:scsi_error: Add flags to mark error handle steps has done Wenchao Hao
` (11 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add helper function scsi_eh_sdev_reset() to perform lun reset and check
if to finish some error commands.
This is preparation for a genernal LUN/target based error handle
strategy and did not change original logic.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 54 +++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 22 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index b7842d927af3..4cd6847e90cf 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1620,6 +1620,34 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
return list_empty(work_q);
}
+static int scsi_eh_sdev_reset(struct scsi_cmnd *scmd,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ struct scsi_cmnd *next;
+ struct scsi_device *sdev = scmd->device;
+ enum scsi_disposition rtn;
+
+ SCSI_LOG_ERROR_RECOVERY(3, sdev_printk(KERN_INFO, sdev,
+ "%s: Sending BDR\n", current->comm));
+
+ rtn = scsi_try_bus_device_reset(scmd);
+ if (rtn != SUCCESS && rtn != FAST_IO_FAIL) {
+ SCSI_LOG_ERROR_RECOVERY(3,
+ sdev_printk(KERN_INFO, sdev,
+ "%s: BDR failed\n", current->comm));
+ return 0;
+ }
+
+ if (!scsi_device_online(sdev) || rtn == FAST_IO_FAIL ||
+ !scsi_eh_tur(scmd))
+ list_for_each_entry_safe(scmd, next, work_q, eh_entry)
+ if (scmd->device == sdev &&
+ scsi_eh_action(scmd, rtn) != FAILED)
+ scsi_eh_finish_cmd(scmd, done_q);
+
+ return list_empty(work_q);
+}
/**
* scsi_eh_bus_device_reset - send bdr if needed
@@ -1637,9 +1665,8 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
struct list_head *work_q,
struct list_head *done_q)
{
- struct scsi_cmnd *scmd, *bdr_scmd, *next;
+ struct scsi_cmnd *scmd, *bdr_scmd;
struct scsi_device *sdev;
- enum scsi_disposition rtn;
shost_for_each_device(sdev, shost) {
if (scsi_host_eh_past_deadline(shost)) {
@@ -1660,26 +1687,9 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
if (!bdr_scmd)
continue;
- SCSI_LOG_ERROR_RECOVERY(3,
- sdev_printk(KERN_INFO, sdev,
- "%s: Sending BDR\n", current->comm));
- rtn = scsi_try_bus_device_reset(bdr_scmd);
- if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
- if (!scsi_device_online(sdev) ||
- rtn == FAST_IO_FAIL ||
- !scsi_eh_tur(bdr_scmd)) {
- list_for_each_entry_safe(scmd, next,
- work_q, eh_entry) {
- if (scmd->device == sdev &&
- scsi_eh_action(scmd, rtn) != FAILED)
- scsi_eh_finish_cmd(scmd,
- done_q);
- }
- }
- } else {
- SCSI_LOG_ERROR_RECOVERY(3,
- sdev_printk(KERN_INFO, sdev,
- "%s: BDR failed\n", current->comm));
+ if (scsi_eh_sdev_reset(bdr_scmd, work_q, done_q)) {
+ scsi_device_put(sdev);
+ break;
}
}
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 06/13] scsi:scsi_error: Add flags to mark error handle steps has done
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (4 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 05/13] scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 07/13] scsi:scsi_error: Define helper to perform LUN based error handle Wenchao Hao
` (10 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
LUN based error handle would mainly do three steps to recovery
commands which are check sense, start unit, and reset lun. It would
fallback to target/host based error handle which would do these steps
too.
Target based error handle would reset target, it would also fallback
to host based error handle.
Add some flags to mark these steps are done to avoid repeating
these steps.
The flags should be cleared when LUN/target based error handler is
waked up or when target/host based error handle finished, and set
when fallback to target/host based error handle.
scsi_eh_get_sense, scsi_eh_stu, scsi_eh_bus_device_reset and
scsi_eh_target_reset would check these flags before actually action.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 54 ++++++++++++++++++++++++++++++++++++++
include/scsi/scsi_device.h | 28 ++++++++++++++++++++
2 files changed, 82 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4cd6847e90cf..9fcfcc682b02 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -57,10 +57,49 @@
#define BUS_RESET_SETTLE_TIME (10)
#define HOST_RESET_SETTLE_TIME (10)
+#define sdev_flags_done(flag) \
+static inline int sdev_##flag(struct scsi_device *sdev) \
+{ \
+ struct scsi_device_eh *eh = sdev->eh; \
+ if (!eh) \
+ return 0; \
+ return eh->flag; \
+}
+
static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
static enum scsi_disposition scsi_try_to_abort_cmd(const struct scsi_host_template *,
struct scsi_cmnd *);
+sdev_flags_done(get_sense_done);
+sdev_flags_done(stu_done);
+sdev_flags_done(reset_done);
+
+static inline int starget_reset_done(struct scsi_target *starget)
+{
+ struct scsi_target_eh *eh = starget->eh;
+ if (!eh)
+ return 0;
+ return eh->reset_done;
+}
+
+static inline void shost_clear_eh_done(struct Scsi_Host *shost)
+{
+ struct scsi_device *sdev;
+ struct scsi_target *starget;
+
+ list_for_each_entry(starget, &shost->__targets, siblings)
+ if (starget->eh)
+ starget->eh->reset_done = 0;
+
+ shost_for_each_device(sdev, shost) {
+ if (!sdev->eh)
+ continue;
+ sdev->eh->get_sense_done = 0;
+ sdev->eh->stu_done = 0;
+ sdev->eh->reset_done = 0;
+ }
+}
+
void scsi_eh_wakeup(struct Scsi_Host *shost)
{
lockdep_assert_held(shost->host_lock);
@@ -1387,6 +1426,9 @@ int scsi_eh_get_sense(struct list_head *work_q,
current->comm));
break;
}
+ if (sdev_get_sense_done(scmd->device) ||
+ starget_reset_done(scsi_target(scmd->device)))
+ continue;
if (!scsi_status_is_check_condition(scmd->result))
/*
* don't request sense if there's no check condition
@@ -1600,6 +1642,9 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
scsi_device_put(sdev);
break;
}
+ if (sdev_stu_done(sdev) ||
+ starget_reset_done(scsi_target(sdev)))
+ continue;
stu_scmd = NULL;
list_for_each_entry(scmd, work_q, eh_entry)
if (scmd->device == sdev && SCSI_SENSE_VALID(scmd) &&
@@ -1683,6 +1728,9 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
bdr_scmd = scmd;
break;
}
+ if (sdev_reset_done(sdev) ||
+ starget_reset_done(scsi_target(sdev)))
+ continue;
if (!bdr_scmd)
continue;
@@ -1731,6 +1779,11 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
}
scmd = list_entry(tmp_list.next, struct scsi_cmnd, eh_entry);
+ if (starget_reset_done(scsi_target(scmd->device))) {
+ /* push back on work queue for further processing */
+ list_move(&scmd->eh_entry, work_q);
+ continue;
+ }
id = scmd_id(scmd);
SCSI_LOG_ERROR_RECOVERY(3,
@@ -2344,6 +2397,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
+ shost_clear_eh_done(shost);
spin_lock_irqsave(shost->host_lock, flags);
if (shost->eh_deadline != -1)
shost->last_reset = 0;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 1894ba1c82cd..2a01e2bbff0d 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -108,6 +108,24 @@ struct scsi_device;
struct scsi_target;
struct scsi_device_eh {
+ /*
+ * LUN rebased error handle would mainly do three
+ * steps to recovery commands which are
+ * check sense
+ * start unit
+ * reset lun
+ * While we would fallback to target or host based error handle
+ * which would do these steps too. Add flags to mark thes steps
+ * are done to avoid repeating these steps.
+ *
+ * The flags should be cleared when LUN based error handler is
+ * wakedup or when target/host based error handle finished,
+ * set when fallback to target or host based error handle.
+ */
+ unsigned get_sense_done:1;
+ unsigned stu_done:1;
+ unsigned reset_done:1;
+
/*
* add scsi command to error handler so it would be handuled by
* driver's error handle strategy
@@ -139,6 +157,16 @@ struct scsi_device_eh {
};
struct scsi_target_eh {
+ /*
+ * flag to mark target reset is done to avoid repeating
+ * these steps when fallback to host based error handle
+ *
+ * The flag should be cleared when target based error handler
+ * is * wakedup or when host based error handle finished,
+ * set when fallback to host based error handle.
+ */
+ unsigned reset_done:1;
+
/*
* add scsi command to error handler so it would be handuled by
* driver's error handle strategy
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 07/13] scsi:scsi_error: Define helper to perform LUN based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (5 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 06/13] scsi:scsi_error: Add flags to mark error handle steps has done Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 08/13] scsi:scsi_error: Add LUN based error handler based previous helper Wenchao Hao
` (9 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add an simple LUN based error handle strategy, it try lun based
recovery first, including check sense, start unit and reset lun.
If all above steps can not recovery all commands, fallback to
target or host based error handle.
This is an simple error handle strategy which can be used by drivers
or other LUN based error handlers.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 37 +++++++++++++++++++++++++++++++++++++
include/scsi/scsi_eh.h | 2 ++
2 files changed, 39 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 9fcfcc682b02..00da77f3f3f8 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2494,6 +2494,43 @@ int scsi_error_handler(void *data)
return 0;
}
+/*
+ * Single LUN error handle
+ *
+ * @work_q: list of scsi commands need to recovery
+ * @done_q: list of scsi commands handled
+ *
+ * return: return 1 if all commands in work_q is recoveryed, else 0 is returned
+ */
+int scsi_sdev_eh(struct scsi_device *sdev,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ int ret = 0;
+ struct scsi_cmnd *scmd;
+
+ SCSI_LOG_ERROR_RECOVERY(2, sdev_printk(KERN_INFO, sdev,
+ "%s:luneh: checking sense\n", current->comm));
+ ret = scsi_eh_get_sense(work_q, done_q);
+ if (ret)
+ return ret;
+
+ SCSI_LOG_ERROR_RECOVERY(2, sdev_printk(KERN_INFO, sdev,
+ "%s:luneh: start unit\n", current->comm));
+ scmd = list_first_entry(work_q, struct scsi_cmnd, eh_entry);
+ ret = scsi_eh_sdev_stu(scmd, work_q, done_q);
+ if (ret)
+ return ret;
+
+ SCSI_LOG_ERROR_RECOVERY(2, sdev_printk(KERN_INFO, sdev,
+ "%s:luneh reset LUN\n", current->comm));
+ scmd = list_first_entry(work_q, struct scsi_cmnd, eh_entry);
+ ret = scsi_eh_sdev_reset(scmd, work_q, done_q);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(scsi_sdev_eh);
+
/*
* Function: scsi_report_bus_reset()
*
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 1ae08e81339f..030b22a8c164 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -18,6 +18,8 @@ extern int scsi_block_when_processing_errors(struct scsi_device *);
extern bool scsi_command_normalize_sense(const struct scsi_cmnd *cmd,
struct scsi_sense_hdr *sshdr);
extern enum scsi_disposition scsi_check_sense(struct scsi_cmnd *);
+extern int scsi_sdev_eh(struct scsi_device *, struct list_head *,
+ struct list_head *);
static inline bool scsi_sense_is_deferred(const struct scsi_sense_hdr *sshdr)
{
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 08/13] scsi:scsi_error: Add LUN based error handler based previous helper
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (6 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 07/13] scsi:scsi_error: Define helper to perform LUN based error handle Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 09/13] scsi:core: increase/decrease target_busy without check can_queue Wenchao Hao
` (8 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add LUN based error handler, drivers can call scsi_device_setup_eh() in
its slave_alloc() to setup it's LUN based error handler; call
scsi_device_clear_eh() in its slave_destroy() to clear LUN based error
handler.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 152 ++++++++++++++++++++++++++++++++++++++
drivers/scsi/scsi_priv.h | 2 +
2 files changed, 154 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 00da77f3f3f8..bb6f05ba199b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2743,3 +2743,155 @@ bool scsi_get_sense_info_fld(const u8 *sense_buffer, int sb_len,
}
}
EXPORT_SYMBOL(scsi_get_sense_info_fld);
+
+struct scsi_lun_eh {
+ spinlock_t eh_lock;
+ unsigned int eh_num;
+ struct list_head eh_cmd_q;
+ struct scsi_device *sdev;
+ struct work_struct eh_handle_work;
+};
+
+/*
+ * error handle strategy based on LUN, following steps
+ * is applied to recovery error commands in list:
+ * check sense data
+ * send start unit
+ * reset lun
+ * if there are still error commands, it would fallback to
+ * target based or host based error handle for further recovery.
+ */
+static void sdev_eh_work(struct work_struct *work)
+{
+ unsigned long flags;
+ struct scsi_lun_eh *luneh =
+ container_of(work, struct scsi_lun_eh, eh_handle_work);
+ struct scsi_device *sdev = luneh->sdev;
+ struct scsi_device_eh *eh = sdev->eh;
+ struct Scsi_Host *shost = sdev->host;
+ struct scsi_cmnd *scmd, *next;
+ LIST_HEAD(eh_work_q);
+ LIST_HEAD(eh_done_q);
+
+ spin_lock_irqsave(&luneh->eh_lock, flags);
+ list_splice_init(&luneh->eh_cmd_q, &eh_work_q);
+ spin_unlock_irqrestore(&luneh->eh_lock, flags);
+
+ if (scsi_sdev_eh(sdev, &eh_work_q, &eh_done_q))
+ goto out_flush_done;
+
+ /*
+ * fallback to target or host based error handle
+ */
+ SCSI_LOG_ERROR_RECOVERY(2, sdev_printk(KERN_INFO, sdev,
+ "%s:luneh fallback to further recovery\n", current->comm));
+ list_for_each_entry_safe(scmd, next, &eh_work_q, eh_entry) {
+ list_del_init(&scmd->eh_entry);
+
+ if (scsi_host_in_recovery(shost) ||
+ __scsi_eh_scmd_add_starget(scmd))
+ __scsi_eh_scmd_add(scmd);
+ }
+
+ eh->get_sense_done = 1;
+ eh->stu_done = 1;
+ eh->reset_done = 1;
+
+out_flush_done:
+ scsi_eh_flush_done_q(&eh_done_q);
+ spin_lock_irqsave(&luneh->eh_lock, flags);
+ luneh->eh_num = 0;
+ spin_unlock_irqrestore(&luneh->eh_lock, flags);
+}
+static void sdev_eh_add_cmnd(struct scsi_cmnd *scmd)
+{
+ unsigned long flags;
+ struct scsi_lun_eh *luneh;
+ struct scsi_device *sdev = scmd->device;
+
+ luneh = (struct scsi_lun_eh *)sdev->eh->driver_data;
+
+ spin_lock_irqsave(&luneh->eh_lock, flags);
+ list_add_tail(&scmd->eh_entry, &luneh->eh_cmd_q);
+ luneh->eh_num++;
+ spin_unlock_irqrestore(&luneh->eh_lock, flags);
+}
+static int sdev_eh_is_busy(struct scsi_device *sdev)
+{
+ int ret = 0;
+ unsigned long flags;
+ struct scsi_lun_eh *luneh;
+
+ if (!sdev->eh)
+ return 0;
+
+ luneh = (struct scsi_lun_eh *)sdev->eh->driver_data;
+
+ spin_lock_irqsave(&luneh->eh_lock, flags);
+ ret = luneh->eh_num;
+ spin_unlock_irqrestore(&luneh->eh_lock, flags);
+
+ return ret;
+}
+static int sdev_eh_wakeup(struct scsi_device *sdev)
+{
+ unsigned long flags;
+ unsigned int nr_error;
+ unsigned int nr_busy;
+ struct scsi_lun_eh *luneh;
+
+ luneh = (struct scsi_lun_eh *)sdev->eh->driver_data;
+
+ spin_lock_irqsave(&luneh->eh_lock, flags);
+ nr_error = luneh->eh_num;
+ spin_unlock_irqrestore(&luneh->eh_lock, flags);
+
+ nr_busy = scsi_device_busy(sdev);
+
+ if (!nr_error || nr_busy != nr_error) {
+ SCSI_LOG_ERROR_RECOVERY(5, sdev_printk(KERN_INFO, sdev,
+ "%s:luneh: do not wake up, busy/error: %d/%d\n",
+ current->comm, nr_busy, nr_error));
+ return 0;
+ }
+
+ SCSI_LOG_ERROR_RECOVERY(2, sdev_printk(KERN_INFO, sdev,
+ "%s:luneh: waking up, busy/error: %d/%d\n",
+ current->comm, nr_busy, nr_error));
+
+ return schedule_work(&luneh->eh_handle_work);
+}
+
+int scsi_device_setup_eh(struct scsi_device *sdev)
+{
+ struct scsi_device_eh *eh;
+ struct scsi_lun_eh *luneh;
+
+ eh = kzalloc(sizeof(struct scsi_device_eh) + sizeof(struct scsi_lun_eh),
+ GFP_KERNEL);
+ if (!eh) {
+ sdev_printk(KERN_ERR, sdev, "failed to setup error handle\n");
+ return -ENOMEM;
+ }
+ luneh = (struct scsi_lun_eh *)eh->driver_data;
+
+ eh->add_cmnd = sdev_eh_add_cmnd;
+ eh->is_busy = sdev_eh_is_busy;
+ eh->wakeup = sdev_eh_wakeup;
+
+ luneh->sdev = sdev;
+ spin_lock_init(&luneh->eh_lock);
+ INIT_LIST_HEAD(&luneh->eh_cmd_q);
+ INIT_WORK(&luneh->eh_handle_work, sdev_eh_work);
+
+ sdev->eh = eh;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(scsi_device_setup_eh);
+
+void scsi_device_clear_eh(struct scsi_device *sdev)
+{
+ kfree(sdev->eh);
+}
+EXPORT_SYMBOL_GPL(scsi_device_clear_eh);
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 484c2f61ffe7..7d7d95a6f526 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -101,6 +101,8 @@ int scsi_eh_get_sense(struct list_head *work_q,
struct list_head *done_q);
bool scsi_noretry_cmd(struct scsi_cmnd *scmd);
void scsi_eh_done(struct scsi_cmnd *scmd);
+int scsi_device_setup_eh(struct scsi_device *sdev);
+void scsi_device_clear_eh(struct scsi_device *sdev);
/* scsi_lib.c */
extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 09/13] scsi:core: increase/decrease target_busy without check can_queue
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (7 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 08/13] scsi:scsi_error: Add LUN based error handler based previous helper Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 10/13] scsi:scsi_error: Define helper to perform target based error handle Wenchao Hao
` (7 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
This is helpful for target based error handler to check if to wake
up.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_lib.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index db0a42fe49c0..4a7fb48aa60f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -293,8 +293,7 @@ void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd)
scsi_dec_host_busy(shost, cmd);
- if (starget->can_queue > 0)
- atomic_dec(&starget->target_busy);
+ atomic_dec(&starget->target_busy);
sbitmap_put(&sdev->budget_map, cmd->budget_token);
cmd->budget_token = -1;
@@ -1311,10 +1310,10 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
spin_unlock_irq(shost->host_lock);
}
+ busy = atomic_inc_return(&starget->target_busy) - 1;
if (starget->can_queue <= 0)
return 1;
- busy = atomic_inc_return(&starget->target_busy) - 1;
if (atomic_read(&starget->target_blocked) > 0) {
if (busy)
goto starved;
@@ -1339,8 +1338,7 @@ static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
list_move_tail(&sdev->starved_entry, &shost->starved_list);
spin_unlock_irq(shost->host_lock);
out_dec:
- if (starget->can_queue > 0)
- atomic_dec(&starget->target_busy);
+ atomic_dec(&starget->target_busy);
return 0;
}
@@ -1784,8 +1782,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
out_dec_host_busy:
scsi_dec_host_busy(shost, cmd);
out_dec_target_busy:
- if (scsi_target(sdev)->can_queue > 0)
- atomic_dec(&scsi_target(sdev)->target_busy);
+ atomic_dec(&scsi_target(sdev)->target_busy);
out_put_budget:
scsi_mq_put_budget(q, cmd->budget_token);
cmd->budget_token = -1;
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 10/13] scsi:scsi_error: Define helper to perform target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (8 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 09/13] scsi:core: increase/decrease target_busy without check can_queue Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 11/13] scsi:scsi_error: Add target based error handler based previous helper Wenchao Hao
` (6 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
It is an simple target based error handle strategy, it try
target based recovery first, including check sense, start unit,
reset lun and reset target.
This is an simple error handle strategy which can be used by drivers
or other target based error handlers.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 129 ++++++++++++++++++++++++++++++++++++++
include/scsi/scsi_eh.h | 2 +
2 files changed, 131 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index bb6f05ba199b..6ebf62f9817a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2531,6 +2531,135 @@ int scsi_sdev_eh(struct scsi_device *sdev,
}
EXPORT_SYMBOL_GPL(scsi_sdev_eh);
+static int starget_eh_stu(struct scsi_target *starget,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ struct scsi_device *sdev;
+ struct scsi_cmnd *scmd, *stu_scmd;
+
+ list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
+ if (sdev_stu_done(sdev))
+ continue;
+
+ stu_scmd = NULL;
+ list_for_each_entry(scmd, work_q, eh_entry)
+ if (scmd->device == sdev && SCSI_SENSE_VALID(scmd) &&
+ scsi_check_sense(scmd) == FAILED) {
+ stu_scmd = scmd;
+ break;
+ }
+ if (!stu_scmd)
+ continue;
+
+ if (scsi_eh_sdev_stu(stu_scmd, work_q, done_q))
+ return 1;
+ }
+
+ return 0;
+}
+
+static int starget_eh_reset_lun(struct scsi_target *starget,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ struct scsi_device *sdev;
+ struct scsi_cmnd *scmd, *bdr_scmd;
+
+ list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
+ if (sdev_reset_done(sdev))
+ continue;
+
+ bdr_scmd = NULL;
+ list_for_each_entry(scmd, work_q, eh_entry)
+ if (scmd->device) {
+ bdr_scmd = scmd;
+ break;
+ }
+ if (!bdr_scmd)
+ continue;
+
+ if (scsi_eh_sdev_reset(bdr_scmd, work_q, done_q))
+ return 1;
+ }
+
+ return 0;
+}
+
+static int starget_eh_reset_target(struct scsi_target *starget,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ enum scsi_disposition rtn;
+ struct scsi_cmnd *scmd, *next;
+ LIST_HEAD(check_list);
+
+ scmd = list_first_entry(work_q, struct scsi_cmnd, eh_entry);
+
+ SCSI_LOG_ERROR_RECOVERY(3, starget_printk(KERN_INFO, starget,
+ "%s: Sending target reset\n", current->comm));
+
+ rtn = scsi_try_target_reset(scmd);
+ if (rtn != SUCCESS && rtn != FAST_IO_FAIL) {
+ SCSI_LOG_ERROR_RECOVERY(3, starget_printk(KERN_INFO, starget,
+ "%s: Target reset failed\n",
+ current->comm));
+ return 0;
+ }
+
+ SCSI_LOG_ERROR_RECOVERY(3, starget_printk(KERN_INFO, starget,
+ "%s: Target reset success\n", current->comm));
+
+ list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
+ if (rtn == SUCCESS)
+ list_move_tail(&scmd->eh_entry, &check_list);
+ else if (rtn == FAST_IO_FAIL)
+ scsi_eh_finish_cmd(scmd, done_q);
+ }
+
+ return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
+}
+
+/*
+ * Target based error handle
+ *
+ * @work_q: list of scsi commands need to recovery
+ * @done_q: list of scsi commands handled
+ *
+ * return: return 1 if all commands in work_q is recoveryed, else 0 is returned
+ */
+int scsi_starget_eh(struct scsi_target *starget,
+ struct list_head *work_q,
+ struct list_head *done_q)
+{
+ int ret = 0;
+
+ SCSI_LOG_ERROR_RECOVERY(2, starget_printk(KERN_INFO, starget,
+ "%s:targeteh: checking sense\n", current->comm));
+ ret = scsi_eh_get_sense(work_q, done_q);
+ if (ret)
+ return ret;
+
+ SCSI_LOG_ERROR_RECOVERY(2, starget_printk(KERN_INFO, starget,
+ "%s:targeteh: start unit\n", current->comm));
+ ret = starget_eh_stu(starget, work_q, done_q);
+ if (ret)
+ return ret;
+
+ SCSI_LOG_ERROR_RECOVERY(2, starget_printk(KERN_INFO, starget,
+ "%s:targeteh reset LUN\n", current->comm));
+ ret = starget_eh_reset_lun(starget, work_q, done_q);
+ if (ret)
+ return ret;
+
+ SCSI_LOG_ERROR_RECOVERY(2, starget_printk(KERN_INFO, starget,
+ "%s:targeteh reset target\n", current->comm));
+ ret = starget_eh_reset_target(starget, work_q, done_q);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(scsi_starget_eh);
+
/*
* Function: scsi_report_bus_reset()
*
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 030b22a8c164..f8f3a143d848 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -20,6 +20,8 @@ extern bool scsi_command_normalize_sense(const struct scsi_cmnd *cmd,
extern enum scsi_disposition scsi_check_sense(struct scsi_cmnd *);
extern int scsi_sdev_eh(struct scsi_device *, struct list_head *,
struct list_head *);
+extern int scsi_starget_eh(struct scsi_target *, struct list_head *,
+ struct list_head *);
static inline bool scsi_sense_is_deferred(const struct scsi_sense_hdr *sshdr)
{
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 11/13] scsi:scsi_error: Add target based error handler based previous helper
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (9 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 10/13] scsi:scsi_error: Define helper to perform target based error handle Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 12/13] scsi:scsi_debug: Add param to control if setup LUN based error handle Wenchao Hao
` (5 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add target based error handler, drivers can call scsi_target_setup_eh() in
its target_alloc() to setup it's target based error handler; call
scsi_device_clear_eh() in its target_destroy() to clear target based error
handler.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 154 ++++++++++++++++++++++++++++++++++++++
drivers/scsi/scsi_priv.h | 2 +
2 files changed, 156 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 6ebf62f9817a..b64828c5c9ee 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -100,6 +100,19 @@ static inline void shost_clear_eh_done(struct Scsi_Host *shost)
}
}
+static inline void starget_clear_eh_done(struct scsi_target *starget)
+{
+ struct scsi_device *sdev;
+
+ list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
+ if (!sdev->eh)
+ continue;
+ sdev->eh->get_sense_done = 0;
+ sdev->eh->stu_done = 0;
+ sdev->eh->reset_done = 0;
+ }
+}
+
void scsi_eh_wakeup(struct Scsi_Host *shost)
{
lockdep_assert_held(shost->host_lock);
@@ -3024,3 +3037,144 @@ void scsi_device_clear_eh(struct scsi_device *sdev)
kfree(sdev->eh);
}
EXPORT_SYMBOL_GPL(scsi_device_clear_eh);
+
+struct starget_eh {
+ spinlock_t eh_lock;
+ unsigned int eh_num;
+ struct list_head eh_cmd_q;
+ struct scsi_target *starget;
+ struct work_struct eh_handle_work;
+ unsigned try_reset_done:1;
+};
+
+static void starget_eh_work(struct work_struct *work)
+{
+ struct scsi_cmnd *scmd, *next;
+ unsigned long flags;
+ LIST_HEAD(eh_work_q);
+ LIST_HEAD(eh_done_q);
+ struct starget_eh *stargeteh =
+ container_of(work, struct starget_eh, eh_handle_work);
+ struct scsi_target *starget = stargeteh->starget;
+ struct scsi_target_eh *eh = starget->eh;
+
+ spin_lock_irqsave(&stargeteh->eh_lock, flags);
+ list_splice_init(&stargeteh->eh_cmd_q, &eh_work_q);
+ spin_unlock_irqrestore(&stargeteh->eh_lock, flags);
+
+ if (scsi_starget_eh(starget, &eh_work_q, &eh_done_q))
+ goto out_clear_flag;
+
+ /*
+ * fallback to host based error handle
+ */
+ SCSI_LOG_ERROR_RECOVERY(2, starget_printk(KERN_INFO, starget,
+ "%s:targeteh fallback to further recovery\n", current->comm));
+ eh->reset_done = 1;
+ list_for_each_entry_safe(scmd, next, &eh_work_q, eh_entry) {
+ list_del_init(&scmd->eh_entry);
+ __scsi_eh_scmd_add(scmd);
+ }
+ goto out_flush_done;
+
+out_clear_flag:
+ starget_clear_eh_done(starget);
+
+out_flush_done:
+ scsi_eh_flush_done_q(&eh_done_q);
+ spin_lock_irqsave(&stargeteh->eh_lock, flags);
+ stargeteh->eh_num = 0;
+ spin_unlock_irqrestore(&stargeteh->eh_lock, flags);
+}
+
+static void starget_eh_add_cmnd(struct scsi_cmnd *scmd)
+{
+ unsigned long flags;
+ struct scsi_target *starget = scmd->device->sdev_target;
+ struct starget_eh *eh;
+
+ eh = (struct starget_eh *)starget->eh->driver_data;
+
+ spin_lock_irqsave(&eh->eh_lock, flags);
+ list_add_tail(&scmd->eh_entry, &eh->eh_cmd_q);
+ eh->eh_num++;
+ spin_unlock_irqrestore(&eh->eh_lock, flags);
+}
+
+static int starget_eh_is_busy(struct scsi_target *starget)
+{
+ int ret = 0;
+ unsigned long flags;
+ struct starget_eh *eh;
+
+ eh = (struct starget_eh *)starget->eh->driver_data;
+
+ spin_lock_irqsave(&eh->eh_lock, flags);
+ ret = eh->eh_num;
+ spin_unlock_irqrestore(&eh->eh_lock, flags);
+
+ return ret;
+}
+
+static int starget_eh_wakeup(struct scsi_target *starget)
+{
+ unsigned long flags;
+ unsigned int nr_error;
+ unsigned int nr_busy;
+ struct starget_eh *eh;
+
+ eh = (struct starget_eh *)starget->eh->driver_data;
+
+ spin_lock_irqsave(&eh->eh_lock, flags);
+ nr_error = eh->eh_num;
+ spin_unlock_irqrestore(&eh->eh_lock, flags);
+
+ nr_busy = atomic_read(&starget->target_busy);
+
+ if (!nr_error || nr_busy != nr_error) {
+ SCSI_LOG_ERROR_RECOVERY(5, starget_printk(KERN_INFO, starget,
+ "%s:targeteh: do not wake up, busy/error is %d/%d\n",
+ current->comm, nr_busy, nr_error));
+ return 0;
+ }
+
+ SCSI_LOG_ERROR_RECOVERY(2, starget_printk(KERN_INFO, starget,
+ "%s:targeteh: waking up, busy/error is %d/%d\n",
+ current->comm, nr_busy, nr_error));
+
+ return schedule_work(&eh->eh_handle_work);
+}
+
+int scsi_target_setup_eh(struct scsi_target *starget)
+{
+ struct scsi_target_eh *eh;
+ struct starget_eh *stargeteh;
+
+ eh = kzalloc(sizeof(struct scsi_device_eh) + sizeof(struct starget_eh),
+ GFP_KERNEL);
+ if (!eh) {
+ starget_printk(KERN_ERR, starget, "failed to setup eh\n");
+ return -ENOMEM;
+ }
+ stargeteh = (struct starget_eh *)eh->driver_data;
+
+ eh->add_cmnd = starget_eh_add_cmnd;
+ eh->is_busy = starget_eh_is_busy;
+ eh->wakeup = starget_eh_wakeup;
+ stargeteh->starget = starget;
+
+ spin_lock_init(&stargeteh->eh_lock);
+ INIT_LIST_HEAD(&stargeteh->eh_cmd_q);
+ INIT_WORK(&stargeteh->eh_handle_work, starget_eh_work);
+
+ starget->eh = eh;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(scsi_target_setup_eh);
+
+void scsi_target_clear_eh(struct scsi_target *starget)
+{
+ kfree(starget->eh);
+}
+EXPORT_SYMBOL_GPL(scsi_target_clear_eh);
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 7d7d95a6f526..12b8d7cedd1e 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -103,6 +103,8 @@ bool scsi_noretry_cmd(struct scsi_cmnd *scmd);
void scsi_eh_done(struct scsi_cmnd *scmd);
int scsi_device_setup_eh(struct scsi_device *sdev);
void scsi_device_clear_eh(struct scsi_device *sdev);
+int scsi_target_setup_eh(struct scsi_target *starget);
+void scsi_target_clear_eh(struct scsi_target *starget);
/* scsi_lib.c */
extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 12/13] scsi:scsi_debug: Add param to control if setup LUN based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (10 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 11/13] scsi:scsi_error: Add target based error handler based previous helper Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-07-23 23:44 ` [PATCH 13/13] scsi:scsi_debug: Add param to control if setup target " Wenchao Hao
` (4 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add new module param lun_eh to control if setup LUN based error handle,
this is used to test the LUN based error handle.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_debug.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 2c5ed618f228..7ab57fc30301 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -58,6 +58,7 @@
#include "sd.h"
#include "scsi_logging.h"
+#include "scsi_priv.h"
/* make sure inq_product_rev string corresponds to this version */
#define SDEBUG_VERSION "0191" /* format to fit INQUIRY revision field */
@@ -840,6 +841,7 @@ static bool have_dif_prot;
static bool write_since_sync;
static bool sdebug_statistics = DEF_STATISTICS;
static bool sdebug_wp;
+static bool sdebug_lun_eh;
/* Following enum: 0: no zbc, def; 1: host aware; 2: host managed */
static enum blk_zoned_model sdeb_zbc_model = BLK_ZONED_NONE;
static char *sdeb_zbc_model_s;
@@ -5423,6 +5425,9 @@ static int scsi_debug_slave_alloc(struct scsi_device *sdp)
pr_info("slave_alloc <%u %u %u %llu>\n",
sdp->host->host_no, sdp->channel, sdp->id, sdp->lun);
+ if (sdebug_lun_eh)
+ return scsi_device_setup_eh(sdp);
+
return 0;
}
@@ -5477,6 +5482,9 @@ static void scsi_debug_slave_destroy(struct scsi_device *sdp)
/* make this slot available for re-use */
devip->used = false;
sdp->hostdata = NULL;
+
+ if (sdebug_lun_eh)
+ scsi_device_clear_eh(sdp);
}
/* Returns true if we require the queued memory to be freed by the caller. */
@@ -6153,6 +6161,7 @@ module_param_named(zone_cap_mb, sdeb_zbc_zone_cap_mb, int, S_IRUGO);
module_param_named(zone_max_open, sdeb_zbc_max_open, int, S_IRUGO);
module_param_named(zone_nr_conv, sdeb_zbc_nr_conv, int, S_IRUGO);
module_param_named(zone_size_mb, sdeb_zbc_zone_size_mb, int, S_IRUGO);
+module_param_named(lun_eh, sdebug_lun_eh, bool, S_IRUGO);
MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -6225,6 +6234,7 @@ MODULE_PARM_DESC(zone_cap_mb, "Zone capacity in MiB (def=zone size)");
MODULE_PARM_DESC(zone_max_open, "Maximum number of open zones; [0] for no limit (def=auto)");
MODULE_PARM_DESC(zone_nr_conv, "Number of conventional zones (def=1)");
MODULE_PARM_DESC(zone_size_mb, "Zone size in MiB (def=auto)");
+MODULE_PARM_DESC(lun_eh, "LUN based error handle (def=0)");
#define SDEBUG_INFO_LEN 256
static char sdebug_info[SDEBUG_INFO_LEN];
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 13/13] scsi:scsi_debug: Add param to control if setup target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (11 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 12/13] scsi:scsi_debug: Add param to control if setup LUN based error handle Wenchao Hao
@ 2023-07-23 23:44 ` Wenchao Hao
2023-08-15 14:08 ` [PATCH 00/13] scsi: Support LUN/target " haowenchao (C)
` (3 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: Wenchao Hao @ 2023-07-23 23:44 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang, Wenchao Hao
Add new module param lun_eh to control if setup target based error handle,
this is used to test the target based error handle.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_debug.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 7ab57fc30301..31105d7fb562 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -842,6 +842,7 @@ static bool write_since_sync;
static bool sdebug_statistics = DEF_STATISTICS;
static bool sdebug_wp;
static bool sdebug_lun_eh;
+static bool sdebug_target_eh;
/* Following enum: 0: no zbc, def; 1: host aware; 2: host managed */
static enum blk_zoned_model sdeb_zbc_model = BLK_ZONED_NONE;
static char *sdeb_zbc_model_s;
@@ -1131,6 +1132,9 @@ static int sdebug_target_alloc(struct scsi_target *starget)
starget->hostdata = targetip;
+ if (sdebug_target_eh)
+ return scsi_target_setup_eh(starget);
+
return 0;
}
@@ -1138,6 +1142,9 @@ static void sdebug_target_destroy(struct scsi_target *starget)
{
struct sdebug_target_info *targetip;
+ if (sdebug_target_eh)
+ scsi_target_clear_eh(starget);
+
targetip = (struct sdebug_target_info *)starget->hostdata;
if (targetip) {
debugfs_remove(targetip->debugfs_entry);
@@ -6162,6 +6169,7 @@ module_param_named(zone_max_open, sdeb_zbc_max_open, int, S_IRUGO);
module_param_named(zone_nr_conv, sdeb_zbc_nr_conv, int, S_IRUGO);
module_param_named(zone_size_mb, sdeb_zbc_zone_size_mb, int, S_IRUGO);
module_param_named(lun_eh, sdebug_lun_eh, bool, S_IRUGO);
+module_param_named(target_eh, sdebug_target_eh, bool, S_IRUGO);
MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -6235,6 +6243,7 @@ MODULE_PARM_DESC(zone_max_open, "Maximum number of open zones; [0] for no limit
MODULE_PARM_DESC(zone_nr_conv, "Number of conventional zones (def=1)");
MODULE_PARM_DESC(zone_size_mb, "Zone size in MiB (def=auto)");
MODULE_PARM_DESC(lun_eh, "LUN based error handle (def=0)");
+MODULE_PARM_DESC(target_eh, "target based error handle (def=0)");
#define SDEBUG_INFO_LEN 256
static char sdebug_info[SDEBUG_INFO_LEN];
--
2.35.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 00/13] scsi: Support LUN/target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (12 preceding siblings ...)
2023-07-23 23:44 ` [PATCH 13/13] scsi:scsi_debug: Add param to control if setup target " Wenchao Hao
@ 2023-08-15 14:08 ` haowenchao (C)
2023-08-15 14:17 ` haowenchao (C)
` (2 subsequent siblings)
16 siblings, 0 replies; 20+ messages in thread
From: haowenchao (C) @ 2023-08-15 14:08 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang
[-- Attachment #1: Type: text/plain, Size: 7741 bytes --]
On 2023/7/24 7:44, Wenchao Hao wrote:
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
>
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
>
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
>
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
>
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
>
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
>
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
>
I tested this patch set with scsi_debug with following scenarios, check
attachments to get my test script and result logs.
+-----------+---------+-------------------------------------------------------+
| lun reset | TUR | Desired result |
+ --------- + ------- + ------------------------------------------------------+
| success | success | retry or finish with EIO(may offline disk) |
+ --------- + ------- + ------------------------------------------------------+
| success | fail | fallback to host recovery, retry or finish with |
| | | EIO(may offline disk) |
+ --------- + ------- + ------------------------------------------------------+
| fail | NA | fallback to host recovery, retry or finish with |
| | | EIO(may offline disk) |
+ --------- + ------- + ------------------------------------------------------+
+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR | target reset | TUR | Desired result |
+-----------+---------+--------------+---------+------------------------------+
| success | success | NA | NA | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| success | fail | success | success | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | success | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | fail | fallback to host recovery, |
| | | | | retry or finish with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | fail | NA | fallback to host recovery, |
| | | | | retry or finish with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR | target reset | TUR | Desired result |
+-----------+---------+--------------+---------+------------------------------+
| success | success | NA | NA | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| success | fail | success | success | lun recovery fallback to |
| | | | | target recovery, retry or |
| | | | | finish with EIO(may offline |
| | | | | disk |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | success | lun recovery fallback to |
| | | | | target recovery, retry or |
| | | | | finish with EIO(may offline |
| | | | | disk |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | fail | lun recovery fallback to |
| | | | | target recovery, then fall |
| | | | | back to host recovery, retry |
| | | | | or fhinsi with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | fail | NA | lun recovery fallback to |
| | | | | target recovery, then fall |
| | | | | back to host recovery, retry |
| | | | | or fhinsi with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
> Wenchao Hao (13):
> scsi: Define basic framework for driver LUN/target based error handle
> scsi:scsi_error: Move complete variable eh_action from shost to sdevice
> scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
> scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
> scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
> scsi:scsi_error: Add flags to mark error handle steps has done
> scsi:scsi_error: Define helper to perform LUN based error handle
> scsi:scsi_error: Add LUN based error handler based previous helper
> scsi:core: increase/decrease target_busy without check can_queue
> scsi:scsi_error: Define helper to perform target based error handle
> scsi:scsi_error: Add target based error handler based previous helper
> scsi:scsi_debug: Add param to control if setup LUN based error handle
> scsi:scsi_debug: Add param to control if setup target based error handle
>
> drivers/scsi/scsi_debug.c | 19 +
> drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++---
> drivers/scsi/scsi_lib.c | 23 +-
> drivers/scsi/scsi_priv.h | 20 ++
> include/scsi/scsi_device.h | 97 +++++
> include/scsi/scsi_eh.h | 4 +
> include/scsi/scsi_host.h | 2 -
> 7 files changed, 813 insertions(+), 57 deletions(-)
>
[-- Attachment #2: logs.tar.gz --]
[-- Type: application/x-gzip, Size: 7681 bytes --]
[-- Attachment #3: test.sh --]
[-- Type: text/plain, Size: 6362 bytes --]
#!/bin/sh
scsi_debug=/mnt/mainline/drivers/scsi/scsi_debug.ko
function clear_error()
{
error=$1
tmpfile=$$_clear
cat $error | grep -v Type | awk '{print $1,$3}' > $tmpfile
while read -r line; do echo "- $line" > $error; done < $tmpfile
rm -rf $tmpfile
echo 0 > /sys/kernel/debug/scsi_debug/target$target_id/fail_reset
}
function lun_test_sense1()
{
echo "LUN reset success, TUR success"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function lun_test_sense2()
{
echo "LUN reset success, TUR failed"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
# inject timeout command for TUR command
echo "0 -1 0x0 " > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function lun_test_sense3()
{
echo "LUN reset failed, fallback to target reset success"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
# inject lunreset failed
echo "4 -1 0xff" > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function target_test_sense1()
{
echo "LUN reset success, TUR success"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function target_test_sense2()
{
echo "LUN reset success, TUR failed, target reset success, TUR success"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
# inject timeout command for TUR command
echo "0 -1 0x0 " > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function target_test_sense3()
{
echo "LUN reset failed, target reset success, TUR success"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
# inject lunreset failed
echo "4 -1 0xff" > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function target_test_sense4()
{
echo "LUN reset failed, target reset success TUR failed"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
# inject lunreset failed
echo "4 -1 0xff" > ${error}
# inject timeout command for TUR command
echo "0 -1 0x0 " > ${error}
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
function target_test_sense5()
{
echo "LUN reset failed, target reset failed, fallback to host recovery"
# inject timeout command for write command
echo "0 -10 0x2a " > ${error}
# inject abort command for write command
echo "3 -1 0x2a " > ${error}
# inject lunreset failed
echo "4 -1 0xff" > ${error}
# inject target reset failed
echo 1 > /sys/kernel/debug/scsi_debug/target$target_id/fail_reset
dd if=/dev/zero of=/dev/$disk bs=1K count=10 oflag=direct
echo $(cat /sys/block/$disk/device/state)
clear_error $error
echo running > /sys/block/$disk/device/state
}
scsi_logging_level -s --error 4 > /dev/null 2>&1
insmod $scsi_debug lun_eh=Y target_eh=N
str=$(lsscsi | grep scsi_debug | head -n 1 | awk '{print $1}')
scsi_id=${str#*\[}
scsi_id=${scsi_id%\]*}
error=/sys/kernel/debug/scsi_debug/$scsi_id/error
str=$(lsscsi | grep scsi_debug | head -n 1 | awk '{print $6}')
disk=$(basename $str)
target_id=${scsi_id%\:*}
echo none > /sys/block/$disk/queue/scheduler
echo 1 > /sys/block/$disk/device/timeout
echo 1 > /sys/block/$disk/device/eh_timeout
for((loop=1;loop<=3;loop++))
do
time=$(date "+%Y-%m-%d-%H-%M-%S")
since=$(date "+%Y-%m-%d %H:%M:%S")
lun_test_sense$loop
sleep 3
until=$(date "+%Y-%m-%d %H:%M:%S")
mkdir logs/lun_sense$loop
journalctl --since="$since" --until="$until" > logs/lun_sense$loop/$time.log
done
rmmod scsi_debug
insmod $scsi_debug lun_eh=N target_eh=Y
str=$(lsscsi | grep scsi_debug | head -n 1 | awk '{print $1}')
scsi_id=${str#*\[}
scsi_id=${scsi_id%\]*}
error=/sys/kernel/debug/scsi_debug/$scsi_id/error
str=$(lsscsi | grep scsi_debug | head -n 1 | awk '{print $6}')
disk=$(basename $str)
echo none > /sys/block/$disk/queue/scheduler
echo 1 > /sys/block/$disk/device/timeout
echo 1 > /sys/block/$disk/device/eh_timeout
for((loop=1;loop<=5;loop++))
do
time=$(date "+%Y-%m-%d-%H-%M-%S")
since=$(date "+%Y-%m-%d %H:%M:%S")
target_test_sense$loop
sleep 3
until=$(date "+%Y-%m-%d %H:%M:%S")
mkdir logs/target_sense$loop
journalctl --since="$since" --until="$until" > logs/target_sense$loop/$time.log
done
rmmod scsi_debug
insmod $scsi_debug lun_eh=Y target_eh=Y
str=$(lsscsi | grep scsi_debug | head -n 1 | awk '{print $1}')
scsi_id=${str#*\[}
scsi_id=${scsi_id%\]*}
error=/sys/kernel/debug/scsi_debug/$scsi_id/error
str=$(lsscsi | grep scsi_debug | head -n 1 | awk '{print $6}')
disk=$(basename $str)
echo none > /sys/block/$disk/queue/scheduler
echo 1 > /sys/block/$disk/device/timeout
echo 1 > /sys/block/$disk/device/eh_timeout
for((loop=1;loop<=5;loop++))
do
time=$(date "+%Y-%m-%d-%H-%M-%S")
since=$(date "+%Y-%m-%d %H:%M:%S")
target_test_sense$loop
sleep 3
until=$(date "+%Y-%m-%d %H:%M:%S")
mkdir logs/lun_target_sense$loop
journalctl --since="$since" --until="$until" > logs/lun_target_sense$loop/$time.log
done
rmmod scsi_debug
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 00/13] scsi: Support LUN/target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (13 preceding siblings ...)
2023-08-15 14:08 ` [PATCH 00/13] scsi: Support LUN/target " haowenchao (C)
@ 2023-08-15 14:17 ` haowenchao (C)
2023-08-15 15:48 ` Bart Van Assche
2023-08-21 13:31 ` haowenchao (C)
2023-08-30 9:45 ` haowenchao (C)
16 siblings, 1 reply; 20+ messages in thread
From: haowenchao (C) @ 2023-08-15 14:17 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang
On 2023/7/24 7:44, Wenchao Hao wrote:
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
>
Friendly PING...
We can reduce probability of blocking whole host when handle error
commands with this patchset, which is important for servers which
deploy large scale disks. And the new error handler is not enabled
default, so it would not affect drivers which do not need it.
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
>
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
>
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
>
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
>
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
>
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
>
> Wenchao Hao (13):
> scsi: Define basic framework for driver LUN/target based error handle
> scsi:scsi_error: Move complete variable eh_action from shost to sdevice
> scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
> scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
> scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
> scsi:scsi_error: Add flags to mark error handle steps has done
> scsi:scsi_error: Define helper to perform LUN based error handle
> scsi:scsi_error: Add LUN based error handler based previous helper
> scsi:core: increase/decrease target_busy without check can_queue
> scsi:scsi_error: Define helper to perform target based error handle
> scsi:scsi_error: Add target based error handler based previous helper
> scsi:scsi_debug: Add param to control if setup LUN based error handle
> scsi:scsi_debug: Add param to control if setup target based error handle
>
> drivers/scsi/scsi_debug.c | 19 +
> drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++---
> drivers/scsi/scsi_lib.c | 23 +-
> drivers/scsi/scsi_priv.h | 20 ++
> include/scsi/scsi_device.h | 97 +++++
> include/scsi/scsi_eh.h | 4 +
> include/scsi/scsi_host.h | 2 -
> 7 files changed, 813 insertions(+), 57 deletions(-)
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 00/13] scsi: Support LUN/target based error handle
2023-08-15 14:17 ` haowenchao (C)
@ 2023-08-15 15:48 ` Bart Van Assche
2023-08-16 2:14 ` haowenchao (C)
0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2023-08-15 15:48 UTC (permalink / raw)
To: haowenchao (C), James E . J . Bottomley, Martin K . Petersen,
Hannes Reinecke, linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang
On 8/15/23 07:17, haowenchao (C) wrote:
> We can reduce probability of blocking whole host when handle error
> commands with this patchset, which is important for servers which
> deploy large scale disks. And the new error handler is not enabled
> default, so it would not affect drivers which do not need it.
Which drivers need this new error handler? I don't see any changes for
SCSI drivers in this patch series other than scsi_debug. Has this patch
series perhaps been developed for a pass-through driver between virtual
machine guests and their host? If so, has it been considered to
configure pass-through such that there is one disk per SCSI host instead
of multiple?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 00/13] scsi: Support LUN/target based error handle
2023-08-15 15:48 ` Bart Van Assche
@ 2023-08-16 2:14 ` haowenchao (C)
0 siblings, 0 replies; 20+ messages in thread
From: haowenchao (C) @ 2023-08-16 2:14 UTC (permalink / raw)
To: Bart Van Assche, James E . J . Bottomley, Martin K . Petersen,
Hannes Reinecke, linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang
On 2023/8/15 23:48, Bart Van Assche wrote:
> On 8/15/23 07:17, haowenchao (C) wrote:
>> We can reduce probability of blocking whole host when handle error
>> commands with this patchset, which is important for servers which
>> deploy large scale disks. And the new error handler is not enabled
>> default, so it would not affect drivers which do not need it.
>
> Which drivers need this new error handler? I don't see any changes for
> SCSI drivers in this patch series other than scsi_debug. Has this patch
> series perhaps been developed for a pass-through driver between virtual
> machine guests and their host? If so, has it been considered to
> configure pass-through such that there is one disk per SCSI host instead
> of multiple?
>
I tested the error hander with our private hardware(the driver code was
not pushed in mainline), as discussed, megaraid_sas, mpt3sas, smartpqi,
hiraid and hisi_sas need this new error handler too, while hisi_sas
needs more steps to using it because it is tightly coupled with
libsas/libata. I want the basic frame to be reviewed first, so just
modify the scsi_debug, which is accessible for everyone and easy to
simulate various kind of error.
I do not know how pass-through driver between virtual machine guests
and their host work, do you mean virtio-scsi in guests OS?
Can you describe more?
Thanks.
> Thanks,
>
> Bart.
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 00/13] scsi: Support LUN/target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (14 preceding siblings ...)
2023-08-15 14:17 ` haowenchao (C)
@ 2023-08-21 13:31 ` haowenchao (C)
2023-08-30 9:45 ` haowenchao (C)
16 siblings, 0 replies; 20+ messages in thread
From: haowenchao (C) @ 2023-08-21 13:31 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel
Cc: Dan Carpenter, louhongxiang
On 2023/7/24 7:44, Wenchao Hao wrote:
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
>
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
>
Ping...
Is anyone reviewing these changes?
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
>
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
>
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
>
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
>
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
>
> Wenchao Hao (13):
> scsi: Define basic framework for driver LUN/target based error handle
> scsi:scsi_error: Move complete variable eh_action from shost to sdevice
> scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
> scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
> scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
> scsi:scsi_error: Add flags to mark error handle steps has done
> scsi:scsi_error: Define helper to perform LUN based error handle
> scsi:scsi_error: Add LUN based error handler based previous helper
> scsi:core: increase/decrease target_busy without check can_queue
> scsi:scsi_error: Define helper to perform target based error handle
> scsi:scsi_error: Add target based error handler based previous helper
> scsi:scsi_debug: Add param to control if setup LUN based error handle
> scsi:scsi_debug: Add param to control if setup target based error handle
>
> drivers/scsi/scsi_debug.c | 19 +
> drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++---
> drivers/scsi/scsi_lib.c | 23 +-
> drivers/scsi/scsi_priv.h | 20 ++
> include/scsi/scsi_device.h | 97 +++++
> include/scsi/scsi_eh.h | 4 +
> include/scsi/scsi_host.h | 2 -
> 7 files changed, 813 insertions(+), 57 deletions(-)
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 00/13] scsi: Support LUN/target based error handle
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
` (15 preceding siblings ...)
2023-08-21 13:31 ` haowenchao (C)
@ 2023-08-30 9:45 ` haowenchao (C)
16 siblings, 0 replies; 20+ messages in thread
From: haowenchao (C) @ 2023-08-30 9:45 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
linux-scsi, linux-kernel, wubo40
Cc: louhongxiang
On 2023/7/24 7:44, Wenchao Hao wrote:
Ping again...
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
>
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
>
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
>
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
>
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
>
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
>
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
>
> Wenchao Hao (13):
> scsi: Define basic framework for driver LUN/target based error handle
> scsi:scsi_error: Move complete variable eh_action from shost to sdevice
> scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
> scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
> scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
> scsi:scsi_error: Add flags to mark error handle steps has done
> scsi:scsi_error: Define helper to perform LUN based error handle
> scsi:scsi_error: Add LUN based error handler based previous helper
> scsi:core: increase/decrease target_busy without check can_queue
> scsi:scsi_error: Define helper to perform target based error handle
> scsi:scsi_error: Add target based error handler based previous helper
> scsi:scsi_debug: Add param to control if setup LUN based error handle
> scsi:scsi_debug: Add param to control if setup target based error handle
>
> drivers/scsi/scsi_debug.c | 19 +
> drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++---
> drivers/scsi/scsi_lib.c | 23 +-
> drivers/scsi/scsi_priv.h | 20 ++
> include/scsi/scsi_device.h | 97 +++++
> include/scsi/scsi_eh.h | 4 +
> include/scsi/scsi_host.h | 2 -
> 7 files changed, 813 insertions(+), 57 deletions(-)
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-08-30 18:38 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-23 23:44 [PATCH 00/13] scsi: Support LUN/target based error handle Wenchao Hao
2023-07-23 23:44 ` [PATCH 01/13] scsi: Define basic framework for driver " Wenchao Hao
2023-07-23 23:44 ` [PATCH 02/13] scsi:scsi_error: Move complete variable eh_action from shost to sdevice Wenchao Hao
2023-07-23 23:44 ` [PATCH 03/13] scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset Wenchao Hao
2023-07-23 23:44 ` [PATCH 04/13] scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT Wenchao Hao
2023-07-23 23:44 ` [PATCH 05/13] scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset Wenchao Hao
2023-07-23 23:44 ` [PATCH 06/13] scsi:scsi_error: Add flags to mark error handle steps has done Wenchao Hao
2023-07-23 23:44 ` [PATCH 07/13] scsi:scsi_error: Define helper to perform LUN based error handle Wenchao Hao
2023-07-23 23:44 ` [PATCH 08/13] scsi:scsi_error: Add LUN based error handler based previous helper Wenchao Hao
2023-07-23 23:44 ` [PATCH 09/13] scsi:core: increase/decrease target_busy without check can_queue Wenchao Hao
2023-07-23 23:44 ` [PATCH 10/13] scsi:scsi_error: Define helper to perform target based error handle Wenchao Hao
2023-07-23 23:44 ` [PATCH 11/13] scsi:scsi_error: Add target based error handler based previous helper Wenchao Hao
2023-07-23 23:44 ` [PATCH 12/13] scsi:scsi_debug: Add param to control if setup LUN based error handle Wenchao Hao
2023-07-23 23:44 ` [PATCH 13/13] scsi:scsi_debug: Add param to control if setup target " Wenchao Hao
2023-08-15 14:08 ` [PATCH 00/13] scsi: Support LUN/target " haowenchao (C)
2023-08-15 14:17 ` haowenchao (C)
2023-08-15 15:48 ` Bart Van Assche
2023-08-16 2:14 ` haowenchao (C)
2023-08-21 13:31 ` haowenchao (C)
2023-08-30 9:45 ` haowenchao (C)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox