From: Tony Battersby <tonyb@cybernetics.com>
To: linux-scsi@vger.kernel.org,
"James E.J. Bottomley" <JBottomley@parallels.com>,
Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
Douglas Gilbert <dgilbert@interlog.com>
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH] [SCSI] sg: fix unkillable I/O wait deadlock with scsi-mq
Date: Wed, 11 Feb 2015 11:31:33 -0500 [thread overview]
Message-ID: <54DB83E5.6020205@cybernetics.com> (raw)
When using the write()/read() interface for submitting commands, the
SCSI generic driver does not call blk_put_request() on a completed SCSI
command until userspace calls read() to get the command completion.
Since scsi-mq uses a fixed number of preallocated requests, this makes
it possible for userspace to exhaust the entire preallocated supply of
requests, leading to deadlock with the user process stuck in a permanent
unkillable I/O wait in sg_write() -> ... -> blk_get_request() -> ... ->
bt_get(). Note that this deadlock can happen only if scsi-mq is
enabled. Prevent the deadlock by calling blk_put_request() as soon as
the SCSI command completes instead of waiting for userspace to call read().
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---
For inclusion in kernel 3.20.
I encountered this problem using mptsas (can_queue == 127) and 8 disks
connected via an expander. I have a test program called cydiskbench
that spawns multiple threads, opens multiple /dev/sg* file descriptors,
and sends multiple disk read/write commands to each /dev/sg* file
descriptor. I can vary the # of disks being tested and the command
queue depth per disk. Whenever I chose test parameters such that
(n_disks * queue_depth_per_disk) > shost->can_queue, the test deadlocked
as described when scsi-mq was enabled but worked just fine with scsi-mq
disabled.
I will send a separate patch to fix the same problem in the bsg driver.
--- linux-3.19.0/drivers/scsi/sg.c.orig 2015-02-08 21:54:22.000000000 -0500
+++ linux-3.19.0/drivers/scsi/sg.c 2015-02-09 17:40:00.000000000 -0500
@@ -1350,6 +1350,17 @@ sg_rq_end_io(struct request *rq, int upt
}
/* Rely on write phase to clean out srp status values, so no "else" */
+ /*
+ * Free the request as soon as it is complete so that its resources
+ * can be reused without waiting for userspace to read() the
+ * result. But keep the associated bio (if any) around until
+ * blk_rq_unmap_user() can be called from user context.
+ */
+ srp->rq = NULL;
+ if (rq->cmd != rq->__cmd)
+ kfree(rq->cmd);
+ __blk_put_request(rq->q, rq);
+
write_lock_irqsave(&sfp->rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
@@ -1777,10 +1788,10 @@ sg_finish_rem_req(Sg_request *srp)
SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
"sg_finish_rem_req: res_used=%d\n",
(int) srp->res_used));
+ if (srp->bio)
+ ret = blk_rq_unmap_user(srp->bio);
+
if (srp->rq) {
- if (srp->bio)
- ret = blk_rq_unmap_user(srp->bio);
-
if (srp->rq->cmd != srp->rq->__cmd)
kfree(srp->rq->cmd);
blk_put_request(srp->rq);
reply other threads:[~2015-02-11 16:41 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DB83E5.6020205@cybernetics.com \
--to=tonyb@cybernetics.com \
--cc=JBottomley@parallels.com \
--cc=axboe@kernel.dk \
--cc=dgilbert@interlog.com \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox