Interrupt strangeness in scsi_request

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Interrupt strangeness in scsi_request_fn()
@ 2014-08-01 20:48 Jan Kara
  2014-08-01 22:00 ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Jan Kara @ 2014-08-01 20:48 UTC (permalink / raw)
  To: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1481 bytes --]

  Hello,

  when debugging one bug, I've noticed one strangeness in
scsi_request_fn(). We enter it with interrupts disabled and queue_lock
held. In the function we do stuff like:
                spin_unlock_irq(shost->host_lock);
                /*
                 * Finally, initialize any error handling parameters, and
                 * set up
                 * the timers for timeouts.
                 */
                scsi_init_cmd_errh(cmd);

                /*
                 * Dispatch the command to the low-level driver.
                 */
                rtn = scsi_dispatch_cmd(cmd);
                spin_lock_irq(q->queue_lock);

Why do we enable interrupts there? The thing is that scsi_request_fn() can
be called from IO completion (thus softirq context) and if we enable
interrupts there, another HW interrupt can interrupt softirq processing
which seems wrong to me.

Now to admit my motivation I have reports from one customer (SLES kernel so
too old to be really interesting upstream but still of some value I
believe) where he's seeing softlockup reports when fully loading his SAN
and from the crash dumps it seems the machine simply spends too long in IO
completion because HW interrupts keep interrupting it plus there's
contention on shost->host_lock. If I change scsi_request_fn() to never
enable interrupts problems go away.

So what would people say to something like the attached patch?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-scsi-Keep-interrupts-disabled-while-submitting-reque.patch --]
[-- Type: text/x-patch, Size: 2337 bytes --]

>From a7c74ea9375cf0b04a11ac3f00bcd835a8499422 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Fri, 1 Aug 2014 22:45:32 +0200
Subject: [PATCH] scsi: Keep interrupts disabled while submitting requests

scsi_request_fn() can be called from softirq context during IO
completion. If it enables interrupts there, HW interrupts can interrupt
softirq processing and queue more IO completion work which can
eventually lead to softlockup reports because IO completion softirq runs
for too long. Keep interrupts disabled in scsi_request_fn().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/scsi/scsi_lib.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f7e316368c99..44b867e9adc9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1481,7 +1481,8 @@ static void scsi_softirq_done(struct request *rq)
  *
  * Returns:     Nothing
  *
- * Lock status: IO request lock assumed to be held when called.
+ * Lock status: IO request lock assumed to be held when called, interrupts
+ * must be disabled.
  */
 static void scsi_request_fn(struct request_queue *q)
 	__releases(q->queue_lock)
@@ -1563,7 +1564,7 @@ static void scsi_request_fn(struct request_queue *q)
 		 * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
 		 *		take the lock again.
 		 */
-		spin_unlock_irq(shost->host_lock);
+		spin_unlock(shost->host_lock);
 
 		/*
 		 * Finally, initialize any error handling parameters, and set up
@@ -1575,7 +1576,7 @@ static void scsi_request_fn(struct request_queue *q)
 		 * Dispatch the command to the low-level driver.
 		 */
 		rtn = scsi_dispatch_cmd(cmd);
-		spin_lock_irq(q->queue_lock);
+		spin_lock(q->queue_lock);
 		if (rtn)
 			goto out_delay;
 	}
@@ -1583,7 +1584,7 @@ static void scsi_request_fn(struct request_queue *q)
 	return;
 
  not_ready:
-	spin_unlock_irq(shost->host_lock);
+	spin_unlock(shost->host_lock);
 
 	/*
 	 * lock q, handle tag, requeue req, and decrement device_busy. We
@@ -1593,7 +1594,7 @@ static void scsi_request_fn(struct request_queue *q)
 	 * cases (host limits or settings) should run the queue at some
 	 * later time.
 	 */
-	spin_lock_irq(q->queue_lock);
+	spin_lock(q->queue_lock);
 	blk_requeue_request(q, req);
 	sdev->device_busy--;
 out_delay:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: Interrupt strangeness in scsi_request_fn()
  2014-08-01 20:48 Interrupt strangeness in scsi_request_fn() Jan Kara
@ 2014-08-01 22:00 ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2014-08-01 22:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-scsi

On Fri, 2014-08-01 at 22:48 +0200, Jan Kara wrote:
>   Hello,
> 
>   when debugging one bug, I've noticed one strangeness in
> scsi_request_fn(). We enter it with interrupts disabled and queue_lock
> held. In the function we do stuff like:
>                 spin_unlock_irq(shost->host_lock);
>                 /*
>                  * Finally, initialize any error handling parameters, and
>                  * set up
>                  * the timers for timeouts.
>                  */
>                 scsi_init_cmd_errh(cmd);
> 
>                 /*
>                  * Dispatch the command to the low-level driver.
>                  */
>                 rtn = scsi_dispatch_cmd(cmd);
>                 spin_lock_irq(q->queue_lock);
> 
> Why do we enable interrupts there? The thing is that scsi_request_fn() can
> be called from IO completion (thus softirq context) and if we enable
> interrupts there, another HW interrupt can interrupt softirq processing
> which seems wrong to me.

According to my reading of kernel/softirq.c, it looks like interrupts
*are* already enabled when softirq functions are called by design.  That
doesn't mean your patch is necessarily wrong, it just means the premise
above isn't quite right.

> Now to admit my motivation I have reports from one customer (SLES kernel so
> too old to be really interesting upstream but still of some value I
> believe) where he's seeing softlockup reports when fully loading his SAN
> and from the crash dumps it seems the machine simply spends too long in IO
> completion because HW interrupts keep interrupting it plus there's
> contention on shost->host_lock. If I change scsi_request_fn() to never
> enable interrupts problems go away.
> 
> So what would people say to something like the attached patch?

I think the theory for enabling interrupts here is that the path is
rather long ... it goes through the driver to the hardware and that the
problem you describe above *should* be self correcting.  If we keep
interrupting dispatch, then we're throttling the outbound I/O which
should reduce the returning interrupts.

I'm not immovably opposed, I'd just like to see better justification.

James



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-08-01 22:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-01 20:48 Interrupt strangeness in scsi_request_fn() Jan Kara
2014-08-01 22:00 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).