Re: Perfromance drop on SCSI hard disk

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <jaxboe@fusionio.com>
To: "Alex,Shi" <alex.shi@intel.com>
Cc: "James.Bottomley@hansenpartnership.com" 
	<James.Bottomley@hansenpartnership.com>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Perfromance drop on SCSI hard disk
Date: Thu, 12 May 2011 22:29:52 +0200	[thread overview]
Message-ID: <4DCC4340.6000407@fusionio.com> (raw)
In-Reply-To: <1305009600.21534.587.camel@debian>

On 2011-05-10 08:40, Alex,Shi wrote:
> commit c21e6beba8835d09bb80e34961 removed the REENTER flag and changed
> scsi_run_queue() to punt all requests on starved_list devices to
> kblockd. Yes, like Jens mentioned, the performance on slow SCSI disk was
> hurt here.  :) (Intel SSD isn't effected here)
> 
> In our testing on 12 SAS disk JBD, the fio write with sync ioengine drop
> about 30~40% throughput, fio randread/randwrite with aio ioengine drop
> about 20%/50% throughput. and fio mmap testing was hurt also. 
> 
> With the following debug patch, the performance can be totally recovered
> in our testing. But without REENTER flag here, in some corner case, like
> a device is keeping blocked and then unblocked repeatedly,
> __blk_run_queue() may recursively call scsi_run_queue() and then cause
> kernel stack overflow. 
> I don't know details of block device driver, just wondering why on scsi
> need the REENTER flag here. :) 

This is a problem and we should do something about it for 2.6.39. I knew
that there would be cases where the async offload would cause a
performance degredation, but not to the extent that you are reporting.
Must be hitting the pathological case.

I can think of two scenarios where it could potentially recurse:

- request_fn enter, end up requeuing IO. Run queue at the end. Rinse,
  repeat.
- Running starved list from request_fn, two (or more) devices could
  alternately recurse.

The first case should be fairly easy to handle. The second one is
already handled by the local list splice.

Looking at the code, is this a real scenario? Only potential recurse I
see is:

scsi_request_fn()
        scsi_dispatch_cmd()
                scsi_queue_insert()
                        __scsi_queue_insert()
                                scsi_run_queue()

Why are we even re-running the queue immediately on a BUSY condition?
Should only be needed if we have zero pending commands from this
particular queue, and for that particular case async run is just fine
since it's a rare condition (or performance would suck already).

And it should only really be needed for the 'q' being passed in, not the
others. Something like the below.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0bac91e..0b01c1f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -74,7 +74,7 @@ struct kmem_cache *scsi_sdb_cache;
  */
 #define SCSI_QUEUE_DELAY	3
 
-static void scsi_run_queue(struct request_queue *q);
+static void scsi_run_queue_async(struct request_queue *q);
 
 /*
  * Function:	scsi_unprep_request()
@@ -161,7 +161,7 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
 	blk_requeue_request(q, cmd->request);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	scsi_run_queue(q);
+	scsi_run_queue_async(q);
 
 	return 0;
 }
@@ -391,13 +391,14 @@ static inline int scsi_host_is_busy(struct Scsi_Host *shost)
  * Purpose:	Select a proper request queue to serve next
  *
  * Arguments:	q	- last request's queue
+ * 		async	- prevent potential request_fn recurse by running async
  *
  * Returns:     Nothing
  *
  * Notes:	The previous command was completely finished, start
  *		a new one if possible.
  */
-static void scsi_run_queue(struct request_queue *q)
+static void __scsi_run_queue(struct request_queue *q, bool async)
 {
 	struct scsi_device *sdev = q->queuedata;
 	struct Scsi_Host *shost;
@@ -438,13 +439,30 @@ static void scsi_run_queue(struct request_queue *q)
 			continue;
 		}
 
-		blk_run_queue_async(sdev->request_queue);
+		spin_unlock(shost->host_lock);
+		spin_lock(sdev->request_queue->queue_lock);
+		__blk_run_queue(sdev->request_queue);
+		spin_unlock(sdev->request_queue->queue_lock);
+		spin_lock(shost->host_lock);
 	}
 	/* put any unprocessed entries back */
 	list_splice(&starved_list, &shost->starved_list);
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
-	blk_run_queue(q);
+	if (async)
+		blk_run_queue_async(q);
+	else
+		blk_run_queue(q);
+}
+
+static void scsi_run_queue(struct request_queue *q)
+{
+	__scsi_run_queue(q, false);
+}
+
+static void scsi_run_queue_async(struct request_queue *q)
+{
+	__scsi_run_queue(q, true);
 }
 
 /*

-- 
Jens Axboe

next prev parent reply	other threads:[~2011-05-12 20:29 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-10  6:40 Perfromance drop on SCSI hard disk Alex,Shi
2011-05-10  6:52 ` Shaohua Li
2011-05-12  0:36   ` Shaohua Li
2011-05-12 20:29 ` Jens Axboe [this message]
2011-05-13  0:11   ` Alex,Shi
2011-05-13  0:48   ` Shaohua Li
2011-05-13  3:01     ` Shaohua Li
2011-05-16  8:04       ` Shaohua Li
2011-05-16  8:37         ` Alex,Shi
2011-05-17  6:09           ` Alex,Shi
2011-05-17  7:20             ` Jens Axboe
2011-05-19  8:26               ` Alex,Shi
2011-05-19  8:47                 ` Alex,Shi
2011-05-19 18:27                 ` Jens Axboe
2011-05-20  0:22                   ` Alex,Shi
2011-05-20  0:40                     ` Shaohua Li
2011-05-20  5:17                       ` Alex,Shi

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0bac91e dfblob:0b01c1f )
 OR (
bs:"Re: Perfromance drop on SCSI hard disk" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DCC4340.6000407@fusionio.com \
    --to=jaxboe@fusionio.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=alex.shi@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox