linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] SCSI: Fix issues between removing device and error handle
@ 2024-06-05  9:17 Wenchao Hao
  2024-06-05  9:17 ` [PATCH v5 1/3] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Wenchao Hao @ 2024-06-05  9:17 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: Wenchao Hao

2 issues are triggered because devices in removing would be skipped
when calling shost_for_each_device(), these issues are mainly in error
recovery path, which are:

1. statistic info printed at beginning of scsi_error_handler is wrong;
2. device reset is not triggered. drivers like smartpqi only implement
   eh_device_reset_handler, if device reset is skipped, the commands
   which had been sent to firmware or devices hardware are not cleared.
   The error handle would flush all these commands in scsi_unjam_host().
   When the commands are finished by hardware, use after free issue is
   triggered.
   The issue first happened with smartpqi devices, and can be reproduced
   with scsi_debug. I did not see any description about SDEV_DEL state
   can not perform device, so this is should be addressed.

A new macro shost_for_each_device_include_deleted() is added to address
these issues. The newly added macro would not skip scsi_device which is
in removing when iterate host's scsi_device and is called when statistic
host's error info and trying to reset scsi_device in error recovery path.

V5:
 - Rewrite cover letter and add fixes tag to each patch

V4:
 - Remove the forth patch which fix IO hang when device removing
   becaust the issue is fixed by commit '6df0e077d76bd (scsi: core:
   Kick the requeue list after inserting when flushing)'

V3:
  - Update patch description
  - Update comments of functions added

V2:
  - Fix IO hang by run all devices' queue after error handler
  - Do not modify shost_for_each_device() directly but add a new
    helper to iterate devices but do not skip devices in removing

Wenchao Hao (3):
  scsi: core: Add new helper to iterate all devices of host
  scsi: scsi_error: Fix wrong statistic when print error info
  scsi: scsi_error: Fix device reset is not triggered

 drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
 drivers/scsi/scsi_error.c  |  4 ++--
 include/scsi/scsi_device.h | 25 ++++++++++++++++++---
 3 files changed, 56 insertions(+), 19 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-07-12  2:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05  9:17 [PATCH v5 0/3] SCSI: Fix issues between removing device and error handle Wenchao Hao
2024-06-05  9:17 ` [PATCH v5 1/3] scsi: core: Add new helper to iterate all devices of host Wenchao Hao
2024-06-12  8:33   ` Hannes Reinecke
2024-06-12 15:06     ` Wenchao Hao
2024-06-13  6:27       ` Hannes Reinecke
2024-06-13  7:10         ` Wenchao Hao
2024-07-12  2:20           ` Wenchao Hao
2024-06-05  9:17 ` [PATCH v5 2/3] scsi: scsi_error: Fix wrong statistic when print error info Wenchao Hao
2024-06-12  8:34   ` Hannes Reinecke
2024-06-12 15:12     ` Wenchao Hao
2024-06-05  9:17 ` [PATCH v5 3/3] scsi: scsi_error: Fix device reset is not triggered Wenchao Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).