public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups
@ 2024-01-22  6:25 chenxiang
  2024-01-22  6:25 ` [PATCH 1/4] scsi: hisi_sas: Fix a deadlock issue related to automatic dump chenxiang
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: chenxiang @ 2024-01-22  6:25 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linuxarm, Xiang Chen

From: Xiang Chen <chenxiang66@hisilicon.com>

This series contains some fixes and cleanups including:
- Fix a deadlock issue related to automatic debugfs;
- Remove redundant checks for automatic debugfs;
- Check whether debugfs is enabled before removing or releasing it;
- Remove hisi_hba->timer for v3 hw;

Xiang Chen (1):
  scsi: hisi_sas: Remove hisi_hba->timer for v3 hw

Yihang Li (3):
  scsi: hisi_sas: Fix a deadlock issue related to automatic dump
  scsi: hisi_sas: Remove redundant checks for automatic debugfs dump
  scsi: hisi_sas: Check whether debugfs is enabled before removing or
    releasing it

 drivers/scsi/hisi_sas/hisi_sas_main.c  | 26 ++++++++++++++++++++------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  8 +++++---
 2 files changed, 25 insertions(+), 9 deletions(-)

-- 
2.8.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] scsi: hisi_sas: Fix a deadlock issue related to automatic dump
  2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
@ 2024-01-22  6:25 ` chenxiang
  2024-01-22  6:25 ` [PATCH 2/4] scsi: hisi_sas: Remove redundant checks for automatic debugfs dump chenxiang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: chenxiang @ 2024-01-22  6:25 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linuxarm, Yihang Li, Xiang Chen

From: Yihang Li <liyihang9@huawei.com>

If we issue a disabling PHY command, the device attached with it will go
offline, if a 2 bit ECC error occurs at the same time, a hung task may be
found:

[ 4613.652388] INFO: task kworker/u256:0:165233 blocked for more than 120 seconds.
[ 4613.666297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.674809] task:kworker/u256:0  state:D stack:    0 pid:165233 ppid:     2 flags:0x00000208
[ 4613.683959] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain [libsas]
[ 4613.691518] Call trace:
[ 4613.694678]  __switch_to+0xf8/0x17c
[ 4613.698872]  __schedule+0x660/0xee0
[ 4613.703063]  schedule+0xac/0x240
[ 4613.706994]  schedule_timeout+0x500/0x610
[ 4613.711705]  __down+0x128/0x36c
[ 4613.715548]  down+0x240/0x2d0
[ 4613.719221]  hisi_sas_internal_abort_timeout+0x1bc/0x260 [hisi_sas_main]
[ 4613.726618]  sas_execute_internal_abort+0x144/0x310 [libsas]
[ 4613.732976]  sas_execute_internal_abort_dev+0x44/0x60 [libsas]
[ 4613.739504]  hisi_sas_internal_task_abort_dev.isra.0+0xbc/0x1b0 [hisi_sas_main]
[ 4613.747499]  hisi_sas_dev_gone+0x174/0x250 [hisi_sas_main]
[ 4613.753682]  sas_notify_lldd_dev_gone+0xec/0x2e0 [libsas]
[ 4613.759781]  sas_unregister_common_dev+0x4c/0x7a0 [libsas]
[ 4613.765962]  sas_destruct_devices+0xb8/0x120 [libsas]
[ 4613.771709]  sas_do_revalidate_domain.constprop.0+0x1b8/0x31c [libsas]
[ 4613.778930]  sas_revalidate_domain+0x60/0xa4 [libsas]
[ 4613.784716]  process_one_work+0x248/0x950
[ 4613.789424]  worker_thread+0x318/0x934
[ 4613.793878]  kthread+0x190/0x200
[ 4613.797810]  ret_from_fork+0x10/0x18
[ 4613.802121] INFO: task kworker/u256:4:316722 blocked for more than 120 seconds.
[ 4613.816026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.824538] task:kworker/u256:4  state:D stack:    0 pid:316722 ppid:     2 flags:0x00000208
[ 4613.833670] Workqueue: 0000:74:02.0 hisi_sas_rst_work_handler [hisi_sas_main]
[ 4613.841491] Call trace:
[ 4613.844647]  __switch_to+0xf8/0x17c
[ 4613.848852]  __schedule+0x660/0xee0
[ 4613.853052]  schedule+0xac/0x240
[ 4613.856984]  schedule_timeout+0x500/0x610
[ 4613.861695]  __down+0x128/0x36c
[ 4613.865542]  down+0x240/0x2d0
[ 4613.869216]  hisi_sas_controller_prereset+0x58/0x1fc [hisi_sas_main]
[ 4613.876324]  hisi_sas_rst_work_handler+0x40/0x8c [hisi_sas_main]
[ 4613.883019]  process_one_work+0x248/0x950
[ 4613.887732]  worker_thread+0x318/0x934
[ 4613.892204]  kthread+0x190/0x200
[ 4613.896118]  ret_from_fork+0x10/0x18
[ 4613.900423] INFO: task kworker/u256:1:348985 blocked for more than 121 seconds.
[ 4613.914341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.922852] task:kworker/u256:1  state:D stack:    0 pid:348985 ppid:     2 flags:0x00000208
[ 4613.931984] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas]
[ 4613.939549] Call trace:
[ 4613.942702]  __switch_to+0xf8/0x17c
[ 4613.946892]  __schedule+0x660/0xee0
[ 4613.951083]  schedule+0xac/0x240
[ 4613.955015]  schedule_timeout+0x500/0x610
[ 4613.959725]  wait_for_common+0x200/0x610
[ 4613.964349]  wait_for_completion+0x3c/0x5c
[ 4613.969146]  flush_workqueue+0x198/0x790
[ 4613.973776]  sas_porte_broadcast_rcvd+0x1e8/0x320 [libsas]
[ 4613.979960]  sas_port_event_worker+0x54/0xa0 [libsas]
[ 4613.985708]  process_one_work+0x248/0x950
[ 4613.990420]  worker_thread+0x318/0x934
[ 4613.994868]  kthread+0x190/0x200
[ 4613.998800]  ret_from_fork+0x10/0x18

This is because when the device goes offline, we obtain the hisi_hba
semaphore and send the ABORT_DEV command to the device. However, the
internal abort timed out due to the 2 bit ECC error and triggers automatic
dump. In addition, since the hisi_hba semaphore has been obtained, the
dump cannot be executed and the controller cannot be reset.

Therefore, the deadlocks occur on the following circular dependencies:
hisi_sas_dev_gone() -> down() -> hisi_sas_internal_task_abort_dev() -> ...
-> hisi_sas_internal_abort_timeout() -> down().

The deadlock is triggered only when the timeout occurs during device goes
offline. To fix this issue, use .rst_ha_timeout to distinguish the
scenario where a device goes offline from other scenarios.

Fixes: 2ff07b5c6fe9 ("scsi: hisi_sas: Directly call register snapshot instead of using workqueue")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index bbb7b2d..1abc62b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1962,9 +1962,17 @@ static bool hisi_sas_internal_abort_timeout(struct sas_task *task,
 	struct hisi_sas_internal_abort_data *timeout = data;
 
 	if (hisi_sas_debugfs_enable && hisi_hba->debugfs_itct[0].itct) {
-		down(&hisi_hba->sem);
+		/*
+		 * If timeout occurs in device gone scenario, to avoid
+		 * circular dependency like:
+		 * hisi_sas_dev_gone() -> down() -> ... ->
+		 * hisi_sas_internal_abort_timeout() -> down().
+		 */
+		if (!timeout->rst_ha_timeout)
+			down(&hisi_hba->sem);
 		hisi_hba->hw->debugfs_snapshot_regs(hisi_hba);
-		up(&hisi_hba->sem);
+		if (!timeout->rst_ha_timeout)
+			up(&hisi_hba->sem);
 	}
 
 	if (task->task_state_flags & SAS_TASK_STATE_DONE) {
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] scsi: hisi_sas: Remove redundant checks for automatic debugfs dump
  2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
  2024-01-22  6:25 ` [PATCH 1/4] scsi: hisi_sas: Fix a deadlock issue related to automatic dump chenxiang
@ 2024-01-22  6:25 ` chenxiang
  2024-01-22  6:25 ` [PATCH 3/4] scsi: hisi_sas: Check whether debugfs is enabled before removing or releasing it chenxiang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: chenxiang @ 2024-01-22  6:25 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linuxarm, Yihang Li, Xiang Chen

From: Yihang Li <liyihang9@huawei.com>

In commit 63f0733d07ce ("scsi: hisi_sas: Allocate DFX memory during dump
trigger"), the memory allocation time of the DFX is changed from device
initialization to dump occurs, so .debugfs_itct is not a valid address and
do not need to check.

The parameter hisi_sas_debugfs_enable is enough to check whether automatic
debugfs dump is triggered, so remove redunant checks.

Fixes: 63f0733d07ce ("scsi: hisi_sas: Allocate DFX memory during dump trigger")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 1abc62b..70c998d 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1573,7 +1573,7 @@ static int hisi_sas_controller_prereset(struct hisi_hba *hisi_hba)
 		return -EPERM;
 	}
 
-	if (hisi_sas_debugfs_enable && hisi_hba->debugfs_itct[0].itct)
+	if (hisi_sas_debugfs_enable)
 		hisi_hba->hw->debugfs_snapshot_regs(hisi_hba);
 
 	return 0;
@@ -1961,7 +1961,7 @@ static bool hisi_sas_internal_abort_timeout(struct sas_task *task,
 	struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
 	struct hisi_sas_internal_abort_data *timeout = data;
 
-	if (hisi_sas_debugfs_enable && hisi_hba->debugfs_itct[0].itct) {
+	if (hisi_sas_debugfs_enable) {
 		/*
 		 * If timeout occurs in device gone scenario, to avoid
 		 * circular dependency like:
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] scsi: hisi_sas: Check whether debugfs is enabled before removing or releasing it
  2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
  2024-01-22  6:25 ` [PATCH 1/4] scsi: hisi_sas: Fix a deadlock issue related to automatic dump chenxiang
  2024-01-22  6:25 ` [PATCH 2/4] scsi: hisi_sas: Remove redundant checks for automatic debugfs dump chenxiang
@ 2024-01-22  6:25 ` chenxiang
  2024-01-22  6:25 ` [PATCH 4/4] scsi: hisi_sas: Remove hisi_hba->timer for v3 hw chenxiang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: chenxiang @ 2024-01-22  6:25 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linuxarm, Yihang Li, Xiang Chen

From: Yihang Li <liyihang9@huawei.com>

Hisi_sas debugfs remove should be executed only when debugfs is enabled.
Check whether debugfs is enabled and then remove it only if enabled.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 3 ++-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 70c998d..0b66c73 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -2625,7 +2625,8 @@ static __exit void hisi_sas_exit(void)
 {
 	sas_release_transport(hisi_sas_stt);
 
-	debugfs_remove(hisi_sas_debugfs_dir);
+	if (hisi_sas_debugfs_enable)
+		debugfs_remove(hisi_sas_debugfs_dir);
 }
 
 module_init(hisi_sas_init);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index b56fbc6..033298d 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -4902,7 +4902,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 err_out_remove_host:
 	scsi_remove_host(shost);
 err_out_undo_debugfs:
-	debugfs_exit_v3_hw(hisi_hba);
+	if (hisi_sas_debugfs_enable)
+		debugfs_exit_v3_hw(hisi_hba);
 err_out_free_host:
 	hisi_sas_free(hisi_hba);
 	scsi_host_put(shost);
@@ -4942,7 +4943,9 @@ static void hisi_sas_v3_remove(struct pci_dev *pdev)
 
 	hisi_sas_v3_destroy_irqs(pdev, hisi_hba);
 	hisi_sas_free(hisi_hba);
-	debugfs_exit_v3_hw(hisi_hba);
+	if (hisi_sas_debugfs_enable)
+		debugfs_exit_v3_hw(hisi_hba);
+
 	scsi_host_put(shost);
 }
 
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] scsi: hisi_sas: Remove hisi_hba->timer for v3 hw
  2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
                   ` (2 preceding siblings ...)
  2024-01-22  6:25 ` [PATCH 3/4] scsi: hisi_sas: Check whether debugfs is enabled before removing or releasing it chenxiang
@ 2024-01-22  6:25 ` chenxiang
  2024-01-25  2:14 ` [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups Martin K. Petersen
  2024-01-30  2:26 ` Martin K. Petersen
  5 siblings, 0 replies; 7+ messages in thread
From: chenxiang @ 2024-01-22  6:25 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linuxarm, Xiang Chen

From: Xiang Chen <chenxiang66@hisilicon.com>

Hisi_hba->timer is not used for v3 hw actually, but there are two places
that some operations related to hisi_hba->timer is calling by v3 hw:
- delete the timer in function hisi_sas_v3_hw() which is only for v3 hw;
- delete the timer in function hisi_sas_controller_reset_prepare() which
is common for v1/v2/v3 hw

We can remove it in the first place, but for the second place we need to
remove it only for v3 hw, so check hw->sht which is Null only for v3 hw
before deleting hisi_hba->timer.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 7 ++++++-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 1 -
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 0b66c73..097dfe4 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1507,7 +1507,12 @@ void hisi_sas_controller_reset_prepare(struct hisi_hba *hisi_hba)
 	scsi_block_requests(shost);
 	hisi_hba->hw->wait_cmds_complete_timeout(hisi_hba, 100, 5000);
 
-	del_timer_sync(&hisi_hba->timer);
+	/*
+	 * hisi_hba->timer is only used for v1/v2 hw, and check hw->sht
+	 * which is also only used for v1/v2 hw to skip it for v3 hw
+	 */
+	if (hisi_hba->hw->sht)
+		del_timer_sync(&hisi_hba->timer);
 
 	set_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
 }
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 033298d..7d2a335 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -4935,7 +4935,6 @@ static void hisi_sas_v3_remove(struct pci_dev *pdev)
 	struct Scsi_Host *shost = sha->shost;
 
 	pm_runtime_get_noresume(dev);
-	del_timer_sync(&hisi_hba->timer);
 
 	sas_unregister_ha(sha);
 	flush_workqueue(hisi_hba->wq);
-- 
2.8.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups
  2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
                   ` (3 preceding siblings ...)
  2024-01-22  6:25 ` [PATCH 4/4] scsi: hisi_sas: Remove hisi_hba->timer for v3 hw chenxiang
@ 2024-01-25  2:14 ` Martin K. Petersen
  2024-01-30  2:26 ` Martin K. Petersen
  5 siblings, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2024-01-25  2:14 UTC (permalink / raw)
  To: chenxiang; +Cc: jejb, martin.petersen, linux-scsi, linuxarm


> This series contains some fixes and cleanups including:
> - Fix a deadlock issue related to automatic debugfs;
> - Remove redundant checks for automatic debugfs;
> - Check whether debugfs is enabled before removing or releasing it;
> - Remove hisi_hba->timer for v3 hw;

Applied to 6.9/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups
  2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
                   ` (4 preceding siblings ...)
  2024-01-25  2:14 ` [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups Martin K. Petersen
@ 2024-01-30  2:26 ` Martin K. Petersen
  5 siblings, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2024-01-30  2:26 UTC (permalink / raw)
  To: jejb, chenxiang; +Cc: Martin K . Petersen, linux-scsi, linuxarm

On Mon, 22 Jan 2024 14:25:43 +0800, chenxiang wrote:

> This series contains some fixes and cleanups including:
> - Fix a deadlock issue related to automatic debugfs;
> - Remove redundant checks for automatic debugfs;
> - Check whether debugfs is enabled before removing or releasing it;
> - Remove hisi_hba->timer for v3 hw;
> 
> Xiang Chen (1):
>   scsi: hisi_sas: Remove hisi_hba->timer for v3 hw
> 
> [...]

Applied to 6.9/scsi-queue, thanks!

[1/4] scsi: hisi_sas: Fix a deadlock issue related to automatic dump
      https://git.kernel.org/mkp/scsi/c/3c4f53b2c341
[2/4] scsi: hisi_sas: Remove redundant checks for automatic debugfs dump
      https://git.kernel.org/mkp/scsi/c/3f0305504765
[3/4] scsi: hisi_sas: Check whether debugfs is enabled before removing or releasing it
      https://git.kernel.org/mkp/scsi/c/69097a631c03
[4/4] scsi: hisi_sas: Remove hisi_hba->timer for v3 hw
      https://git.kernel.org/mkp/scsi/c/f9242f166770

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-01-30  2:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-22  6:25 [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups chenxiang
2024-01-22  6:25 ` [PATCH 1/4] scsi: hisi_sas: Fix a deadlock issue related to automatic dump chenxiang
2024-01-22  6:25 ` [PATCH 2/4] scsi: hisi_sas: Remove redundant checks for automatic debugfs dump chenxiang
2024-01-22  6:25 ` [PATCH 3/4] scsi: hisi_sas: Check whether debugfs is enabled before removing or releasing it chenxiang
2024-01-22  6:25 ` [PATCH 4/4] scsi: hisi_sas: Remove hisi_hba->timer for v3 hw chenxiang
2024-01-25  2:14 ` [PATCH 0/4] scsi: hisi_sas: Minor fixes and cleanups Martin K. Petersen
2024-01-30  2:26 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox