* [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle
@ 2023-09-28 7:35 Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Wenchao Hao @ 2023-09-28 7:35 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang, Wenchao Hao
I am testing SCSI error handle with my previous scsi_debug error
injection patches, and found some issues when removing device and
error handler happened together.
These issues are triggered because devices in removing would be skipped
when calling shost_for_each_device().
Three issues are found:
1. statistic info printed at beginning of scsi_error_handler is wrong
2. device reset is not triggered
3. IO requeued to request_queue would be hang after error handle
V2:
- Fix IO hang by run all devices' queue after error handler
- Do not modify shost_for_each_device() directly but add a new
helper to iterate devices but do not skip devices in removing
Wenchao Hao (4):
scsi: core: Add new helper to iterate all devices of host
scsi: scsi_error: Fix wrong statistic when print error info
scsi: scsi_error: Fix device reset is not triggered
scsi: scsi_core: Fix IO hang when device removing
drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
drivers/scsi/scsi_error.c | 4 ++--
drivers/scsi/scsi_lib.c | 2 +-
include/scsi/scsi_device.h | 25 +++++++++++++++++++---
4 files changed, 53 insertions(+), 21 deletions(-)
--
2.32.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
@ 2023-09-28 7:35 ` Wenchao Hao
2023-09-28 11:41 ` kernel test robot
2023-09-28 7:35 ` [PATCH v2 2/4] scsi: scsi_error: Fix wrong statistic when print error info Wenchao Hao
` (4 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Wenchao Hao @ 2023-09-28 7:35 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang, Wenchao Hao
shost_for_each_device() would skip devices which is in SDEV_CANCEL or
SDEV_DEL state, for some scenarios, we donot want to skip these devices,
so add a new macro shost_for_each_device_include_deleted() to handle it.
Splict scsi_device_get() and new parameter "skip_deleted" is added to
__scsi_iterate_devices() to implement this new macro.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
include/scsi/scsi_device.h | 25 +++++++++++++++++++---
2 files changed, 50 insertions(+), 18 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index d0911bc28663..9e31398b6e03 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -704,6 +704,26 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
return 0;
}
+static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
+{
+ /*
+ * if skip_deleted is true and device is in removing, return failed
+ */
+ if (skip_deleted &&
+ (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
+ goto fail;
+ if (!try_module_get(sdev->host->hostt->module))
+ goto fail;
+ if (!get_device(&sdev->sdev_gendev))
+ goto fail_put_module;
+ return 0;
+
+fail_put_module:
+ module_put(sdev->host->hostt->module);
+fail:
+ return -ENXIO;
+}
+
/**
* scsi_device_get - get an additional reference to a scsi_device
* @sdev: device to get a reference to
@@ -717,18 +737,7 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
*/
int scsi_device_get(struct scsi_device *sdev)
{
- if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
- goto fail;
- if (!try_module_get(sdev->host->hostt->module))
- goto fail;
- if (!get_device(&sdev->sdev_gendev))
- goto fail_put_module;
- return 0;
-
-fail_put_module:
- module_put(sdev->host->hostt->module);
-fail:
- return -ENXIO;
+ return __scsi_device_get(sdev, 0);
}
EXPORT_SYMBOL(scsi_device_get);
@@ -749,9 +758,13 @@ void scsi_device_put(struct scsi_device *sdev)
}
EXPORT_SYMBOL(scsi_device_put);
-/* helper for shost_for_each_device, see that for documentation */
+/**
+ * helper for shost_for_each_device, see that for documentation
+ * @skip_deleted: if true, sdev in progress of removing would be skipped
+ */
struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
- struct scsi_device *prev)
+ struct scsi_device *prev,
+ bool skip_deleted)
{
struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
struct scsi_device *next = NULL;
@@ -761,7 +774,7 @@ struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
while (list->next != &shost->__devices) {
next = list_entry(list->next, struct scsi_device, siblings);
/* skip devices that we can't get a reference to */
- if (!scsi_device_get(next))
+ if (!__scsi_device_get(next, skip_deleted))
break;
next = NULL;
list = list->next;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index b9230b6add04..6f8df9b04be3 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -390,7 +390,8 @@ extern void __starget_for_each_device(struct scsi_target *, void *,
/* only exposed to implement shost_for_each_device */
extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
- struct scsi_device *);
+ struct scsi_device *,
+ bool);
/**
* shost_for_each_device - iterate over all devices of a host
@@ -400,11 +401,29 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
* Iterator that returns each device attached to @shost. This loop
* takes a reference on each device and releases it at the end. If
* you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: this macro would skip sdev which is in progress of removing
*/
#define shost_for_each_device(sdev, shost) \
- for ((sdev) = __scsi_iterate_devices((shost), NULL); \
+ for ((sdev) = __scsi_iterate_devices((shost), NULL, 1); \
+ (sdev); \
+ (sdev) = __scsi_iterate_devices((shost), (sdev), 1))
+
+/**
+ * shost_for_each_device_include_deleted- iterate over all devices of a host
+ * @sdev: the &struct scsi_device to use as a cursor
+ * @shost: the &struct scsi_host to iterate over
+ *
+ * Iterator that returns each device attached to @shost. This loop
+ * takes a reference on each device and releases it at the end. If
+ * you break out of the loop, you must call scsi_device_put(sdev).
+ *
+ * Note: this macro would include sdev which is in progress of removing
+ */
+#define shost_for_each_device_include_deleted(sdev, shost) \
+ for ((sdev) = __scsi_iterate_devices((shost), NULL, 0); \
(sdev); \
- (sdev) = __scsi_iterate_devices((shost), (sdev)))
+ (sdev) = __scsi_iterate_devices((shost), (sdev), 0))
/**
* __shost_for_each_device - iterate over all devices of a host (UNLOCKED)
--
2.32.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/4] scsi: scsi_error: Fix wrong statistic when print error info
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
@ 2023-09-28 7:35 ` Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 3/4] scsi: scsi_error: Fix device reset is not triggered Wenchao Hao
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Wenchao Hao @ 2023-09-28 7:35 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang, Wenchao Hao
shost_for_each_device() would skip devices which is in progress of
removing, so commands of these devices would be ignored in
scsi_eh_prt_fail_stats().
Fix this issue by using shost_for_each_device_include_deleted()
to iterate devices in scsi_eh_prt_fail_stats().
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..2550f8cd182a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -407,7 +407,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
int cmd_cancel = 0;
int devices_failed = 0;
- shost_for_each_device(sdev, shost) {
+ shost_for_each_device_include_deleted(sdev, shost) {
list_for_each_entry(scmd, work_q, eh_entry) {
if (scmd->device == sdev) {
++total_failures;
--
2.32.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/4] scsi: scsi_error: Fix device reset is not triggered
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 2/4] scsi: scsi_error: Fix wrong statistic when print error info Wenchao Hao
@ 2023-09-28 7:35 ` Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 4/4] scsi: scsi_core: Fix IO hang when device removing Wenchao Hao
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Wenchao Hao @ 2023-09-28 7:35 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang, Wenchao Hao
shost_for_each_device() would skip devices which is in progress of
removing, so scsi_try_bus_device_reset() for these devices would be
skipped in scsi_eh_bus_device_reset() with following order:
T1: T2:scsi_error_handle
__scsi_remove_device
scsi_device_set_state(sdev, SDEV_DEL)
// would skip device with SDEV_DEL state
shost_for_each_device()
scsi_try_bus_device_reset
flush all commands
...
releasing and free scsi_device
Some drivers like smartpqi only implement eh_device_reset_handler,
if device reset is skipped, the commands which had been sent to
firmware or devices hardware are not cleared. The error handle
would flush all these commands in scsi_unjam_host().
When the commands are finished by hardware, use after free issue is
triggered.
Fix this issue by using shost_for_each_device_include_deleted()
to iterate devices in scsi_eh_bus_device_reset().
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_error.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 2550f8cd182a..57e3cc556549 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1568,7 +1568,7 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
struct scsi_device *sdev;
enum scsi_disposition rtn;
- shost_for_each_device(sdev, shost) {
+ shost_for_each_device_include_deleted(sdev, shost) {
if (scsi_host_eh_past_deadline(shost)) {
SCSI_LOG_ERROR_RECOVERY(3,
sdev_printk(KERN_INFO, sdev,
--
2.32.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 4/4] scsi: scsi_core: Fix IO hang when device removing
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
` (2 preceding siblings ...)
2023-09-28 7:35 ` [PATCH v2 3/4] scsi: scsi_error: Fix device reset is not triggered Wenchao Hao
@ 2023-09-28 7:35 ` Wenchao Hao
2023-10-07 9:46 ` [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
2023-10-09 6:59 ` Wenchao Hao
5 siblings, 0 replies; 8+ messages in thread
From: Wenchao Hao @ 2023-09-28 7:35 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang, Wenchao Hao
shost_for_each_device() would skip devices which is in progress of
removing, so scsi_run_queue() for these devices would be skipped in
scsi_run_host_queues() after blocking hosts' IO.
IO hang would be caused if return true when state is SDEV_CANCEL with
following order:
T1: T2:scsi_error_handler
__scsi_remove_device()
scsi_device_set_state(sdev, SDEV_CANCEL)
...
sd_remove()
del_gendisk()
blk_mq_freeze_queue_wait()
scsi_eh_flush_done_q()
scsi_queue_insert(scmd,...)
Because scsi_queue_insert() would not kick device's queue after commit
8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
After scsi_unjam_host(), the scsi error handler would call scsi_run_queue()
to trigger run queue for devices, while it would not run queue for
devices which is in progress of removing because shost_for_each_device()
would skip them.
So the requests added to these queues would not be handled any more,
and the removing device process would hang too.
Fix this issue by using shost_for_each_device_include_deleted() in
scsi_run_queue() to trigger a run queue for devices in removing.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
---
drivers/scsi/scsi_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c2f647a7c1b0..34b408d182e2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -466,7 +466,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
{
struct scsi_device *sdev;
- shost_for_each_device(sdev, shost)
+ shost_for_each_device_include_deleted(sdev, shost)
scsi_run_queue(sdev->request_queue);
}
--
2.32.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host
2023-09-28 7:35 ` [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
@ 2023-09-28 11:41 ` kernel test robot
0 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2023-09-28 11:41 UTC (permalink / raw)
To: Wenchao Hao, James E . J . Bottomley, Martin K . Petersen,
linux-scsi
Cc: oe-kbuild-all, linux-kernel, louhongxiang, Wenchao Hao
Hi Wenchao,
kernel test robot noticed the following build warnings:
[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on jejb-scsi/for-next linus/master v6.6-rc3 next-20230928]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Wenchao-Hao/scsi-core-Add-new-helper-to-iterate-all-devices-of-host/20230928-153648
base: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
patch link: https://lore.kernel.org/r/20230928073543.3496394-2-haowenchao2%40huawei.com
patch subject: [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230928/202309281916.qy89onYp-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230928/202309281916.qy89onYp-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309281916.qy89onYp-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/scsi/scsi.c:762: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* helper for shost_for_each_device, see that for documentation
vim +762 drivers/scsi/scsi.c
760
761 /**
> 762 * helper for shost_for_each_device, see that for documentation
763 * @skip_deleted: if true, sdev in progress of removing would be skipped
764 */
765 struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *shost,
766 struct scsi_device *prev,
767 bool skip_deleted)
768 {
769 struct list_head *list = (prev ? &prev->siblings : &shost->__devices);
770 struct scsi_device *next = NULL;
771 unsigned long flags;
772
773 spin_lock_irqsave(shost->host_lock, flags);
774 while (list->next != &shost->__devices) {
775 next = list_entry(list->next, struct scsi_device, siblings);
776 /* skip devices that we can't get a reference to */
777 if (!__scsi_device_get(next, skip_deleted))
778 break;
779 next = NULL;
780 list = list->next;
781 }
782 spin_unlock_irqrestore(shost->host_lock, flags);
783
784 if (prev)
785 scsi_device_put(prev);
786 return next;
787 }
788 EXPORT_SYMBOL(__scsi_iterate_devices);
789
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
` (3 preceding siblings ...)
2023-09-28 7:35 ` [PATCH v2 4/4] scsi: scsi_core: Fix IO hang when device removing Wenchao Hao
@ 2023-10-07 9:46 ` Wenchao Hao
2023-10-09 6:59 ` Wenchao Hao
5 siblings, 0 replies; 8+ messages in thread
From: Wenchao Hao @ 2023-10-07 9:46 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang
On 2023/9/28 15:35, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>
ping...
> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
>
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (4):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
> scsi: scsi_core: Fix IO hang when device removing
>
> drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
> drivers/scsi/scsi_error.c | 4 ++--
> drivers/scsi/scsi_lib.c | 2 +-
> include/scsi/scsi_device.h | 25 +++++++++++++++++++---
> 4 files changed, 53 insertions(+), 21 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
` (4 preceding siblings ...)
2023-10-07 9:46 ` [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
@ 2023-10-09 6:59 ` Wenchao Hao
5 siblings, 0 replies; 8+ messages in thread
From: Wenchao Hao @ 2023-10-09 6:59 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen, linux-scsi
Cc: linux-kernel, louhongxiang
On 2023/9/28 15:35, Wenchao Hao wrote:
> I am testing SCSI error handle with my previous scsi_debug error
> injection patches, and found some issues when removing device and
> error handler happened together.
>
> These issues are triggered because devices in removing would be skipped
> when calling shost_for_each_device().
>
> Three issues are found:
> 1. statistic info printed at beginning of scsi_error_handler is wrong
> 2. device reset is not triggered
> 3. IO requeued to request_queue would be hang after error handle
>
These patches fix bug which is easy to recurrent when removing device
and error handle happened together, so friendly ping again...
> V2:
> - Fix IO hang by run all devices' queue after error handler
> - Do not modify shost_for_each_device() directly but add a new
> helper to iterate devices but do not skip devices in removing
>
> Wenchao Hao (4):
> scsi: core: Add new helper to iterate all devices of host
> scsi: scsi_error: Fix wrong statistic when print error info
> scsi: scsi_error: Fix device reset is not triggered
> scsi: scsi_core: Fix IO hang when device removing
>
> drivers/scsi/scsi.c | 43 +++++++++++++++++++++++++-------------
> drivers/scsi/scsi_error.c | 4 ++--
> drivers/scsi/scsi_lib.c | 2 +-
> include/scsi/scsi_device.h | 25 +++++++++++++++++++---
> 4 files changed, 53 insertions(+), 21 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-10-09 6:59 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-28 7:35 [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 1/4] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
2023-09-28 11:41 ` kernel test robot
2023-09-28 7:35 ` [PATCH v2 2/4] scsi: scsi_error: Fix wrong statistic when print error info Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 3/4] scsi: scsi_error: Fix device reset is not triggered Wenchao Hao
2023-09-28 7:35 ` [PATCH v2 4/4] scsi: scsi_core: Fix IO hang when device removing Wenchao Hao
2023-10-07 9:46 ` [PATCH v2 0/4] SCSI: Fix issues between removing device and error handle Wenchao Hao
2023-10-09 6:59 ` Wenchao Hao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).