From: Maxime Bizon <mbizon@freebox.fr>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org, usb-storage@lists.one-eyed-alien.net,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Reproducible deadlock when usb-storage scsi command timeouts twice
Date: Wed, 26 Apr 2023 19:55:42 +0200 [thread overview]
Message-ID: <ZEllnjMKT8ulZbJh@sakura> (raw)
Hello,
I have a faulty 128MB USB stick that happens to randomly timeout when
reading a specific block.
After the first timeout, the scsi layer retries the command. If that
retry work, then everything goes well. But if the retried command
timeouts as well, then I get a deadlock.
To reproduce fast & reliable, I'm using these settings:
# echo 10 >/proc/sys/kernel/hung_task_timeout_secs
# echo 0x3f > /proc/sys/dev/scsi/logging_level
# cd /sys/block/sda/queue/
# echo none > scheduler
# echo 1 > nr_requests
# echo 1 > nomerges
# echo 4 > max_sectors_kb
# echo 5000 > io_timeout
# cat /dev/sda > /dev/null
dmesg with timestamps so it's easy to spot the timeouts:
[ 211.732418] *** thread awakened
[ 211.732429] Command READ_10 (10 bytes)
[ 211.732438] bytes: 28 00 00 02 01 f8 00 00 08 00
[ 211.732447] Bulk Command S 0x43425355 T 0xb8ca L 4096 F 128 Trg 0 LUN 0 CL 10
[ 211.732459] xfer 31 bytes
[ 211.732539] Status code 0; transferred 31/31
[ 211.732550] -- transfer complete
[ 211.732557] Bulk command transfer result=0
[ 211.732564] xfer 4096 bytes, 1 entries
[ 211.732913] Status code 0; transferred 4096/4096
[ 211.732925] -- transfer complete
[ 211.732931] Bulk data transfer result 0x0
[ 211.732938] Attempting to get CSW...
[ 211.732944] xfer 13 bytes
[ 211.733036] Status code 0; transferred 13/13
[ 211.733047] -- transfer complete
[ 211.733054] Bulk status result = 0
[ 211.733061] Bulk Status S 0x53425355 T 0xb8ca R 0 Stat 0x0
[ 211.733071] scsi cmd done, result=0x0
[ 211.733092] *** thread sleeping
[ 211.733162] *** thread awakened
[ 211.733174] Command READ_10 (10 bytes)
[ 211.733183] bytes: 28 00 00 02 02 00 00 00 08 00
[ 211.733192] Bulk Command S 0x43425355 T 0xb8cb L 4096 F 128 Trg 0 LUN 0 CL 10
[ 211.733204] xfer 31 bytes
[ 211.733288] Status code 0; transferred 31/31
[ 211.733300] -- transfer complete
[ 211.733306] Bulk command transfer result=0
[ 211.733313] xfer 4096 bytes, 1 entries
[ 218.089304] sd 0:0:0:0: [sda] tag#0 abort scheduled
[ 218.109297] sd 0:0:0:0: [sda] tag#0 aborting command
[ 218.109315] command_abort called
[ 218.109324] -- cancelling sg request
[ 218.112380] Status code -104; transferred 3072/4096
[ 218.112393] -- transfer cancelled
[ 218.112400] Bulk data transfer result 0x4
[ 218.112407] -- command was aborted
[ 218.209278] usb 1-1.2: reset high-speed USB device number 3 using orion-ehci
[ 218.359923] usb_reset_device returns 0
[ 218.359936] scsi command aborted
[ 218.359947] *** thread sleeping
[ 218.359964] sd 0:0:0:0: [sda] tag#0 retry aborted command
[ 218.399298] *** thread awakened
[ 218.399311] Command READ_10 (10 bytes)
[ 218.399320] bytes: 28 00 00 02 02 00 00 00 08 00
[ 218.399330] Bulk Command S 0x43425355 T 0xb8cc L 4096 F 128 Trg 0 LUN 0 CL 10
[ 218.399342] xfer 31 bytes
[ 218.399544] Status code 0; transferred 31/31
[ 218.399556] -- transfer complete
[ 218.399562] Bulk command transfer result=0
[ 218.399570] xfer 4096 bytes, 1 entries
[ 225.129297] sd 0:0:0:0: [sda] tag#0 previous abort failed
[ 225.129337] scsi host0: Waking error handler thread
[ 225.129358] scsi host0: scsi_eh_0: waking up 0/1/1
[ 225.129375] scsi host0: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
[ 225.129387] scsi host0: Total of 1 commands on 1 devices require eh work
[ 225.129402] sd 0:0:0:0: scsi_eh_0: Sending BDR
[ 225.129414] device_reset called
[ 235.369290] INFO: task scsi_eh_0:173 blocked for more than 10 seconds.
[ 235.369311] Not tainted 6.3.0-00615-gffe64935a4a2 #9
[ 235.369320] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 235.369327] task:scsi_eh_0 state:D stack:0 pid:173 ppid:2 flags:0x00000000
[ 235.369350] __schedule from schedule+0x70/0xac
[ 235.369374] schedule from schedule_preempt_disabled+0x24/0x34
[ 235.369393] schedule_preempt_disabled from __mutex_lock.constprop.0+0x14c/0x280
[ 235.369412] __mutex_lock.constprop.0 from device_reset+0x28/0x64
[ 235.369433] device_reset from scsi_try_bus_device_reset+0x24/0x58
[ 235.369452] scsi_try_bus_device_reset from scsi_eh_ready_devs+0x2f0/0x97c
[ 235.369470] scsi_eh_ready_devs from scsi_error_handler+0x238/0x49c
[ 235.369488] scsi_error_handler from kthread+0xc4/0xdc
[ 235.369507] kthread from ret_from_fork+0x14/0x3c
[ 235.369567] INFO: task usb-storage:176 blocked for more than 10 seconds.
[ 235.369576] Not tainted 6.3.0-00615-gffe64935a4a2 #9
[ 235.369583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 235.369590] task:usb-storage state:D stack:0 pid:176 ppid:2 flags:0x00000000
[ 235.369606] __schedule from schedule+0x70/0xac
[ 235.369622] schedule from schedule_timeout+0x18/0xd0
[ 235.369641] schedule_timeout from __wait_for_common+0xc0/0x13c
[ 235.369660] __wait_for_common from usb_sg_wait+0x10c/0x118
[ 235.369677] usb_sg_wait from usb_stor_bulk_transfer_sglist+0xc4/0x118
[ 235.369695] usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x24/0x3c
[ 235.369713] usb_stor_bulk_srb from usb_stor_Bulk_transport+0x164/0x44c
[ 235.369731] usb_stor_Bulk_transport from usb_stor_invoke_transport+0x20/0x5c4
[ 235.369750] usb_stor_invoke_transport from usb_stor_control_thread+0x1a4/0x2bc
[ 235.369769] usb_stor_control_thread from kthread+0xc4/0xdc
[ 235.369786] kthread from ret_from_fork+0x14/0x3c
Turns out eh_device_reset_handler() is called with a pending command
(srb == us->srb), and I don't know if the usb code was expecting
eh_abort_handler() to be called first.
This patch fixes the issue, not sure if it's correct:
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -455,6 +455,9 @@ static int device_reset(struct scsi_cmnd *srb)
usb_stor_dbg(us, "%s called\n", __func__);
+ if (us->srb == srb)
+ command_abort(srb);
+
/* lock the device pointers and do the reset */
mutex_lock(&(us->dev_mutex));
result = us->transport_reset(us);
--
Maxime
next reply other threads:[~2023-04-26 17:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-26 17:55 Maxime Bizon [this message]
2023-04-26 19:20 ` Reproducible deadlock when usb-storage scsi command timeouts twice Alan Stern
2023-04-26 23:06 ` Maxime Bizon
2023-04-30 19:39 ` Bart Van Assche
2023-04-30 21:10 ` Alan Stern
2023-05-03 10:24 ` Benjamin Block
2023-05-03 10:55 ` Oliver Neukum
2023-05-03 12:51 ` Benjamin Block
2023-05-03 13:54 ` Oliver Neukum
2023-05-03 14:25 ` Alan Stern
2023-05-04 8:52 ` Benjamin Block
2023-05-04 14:05 ` Alan Stern
2023-05-04 14:41 ` Maxime Bizon
2023-05-04 20:25 ` Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZEllnjMKT8ulZbJh@sakura \
--to=mbizon@freebox.fr \
--cc=gregkh@linuxfoundation.org \
--cc=linux-usb@vger.kernel.org \
--cc=stern@rowland.harvard.edu \
--cc=usb-storage@lists.one-eyed-alien.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox