public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Mansfield <patmans@us.ibm.com>
To: James Bottomley <James.Bottomley@steeleye.com>
Cc: SCSI Mailing List <linux-scsi@vger.kernel.org>
Subject: Re: [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached
Date: Wed, 5 Mar 2003 10:43:20 -0800	[thread overview]
Message-ID: <20030305104320.A14722@beaverton.ibm.com> (raw)
In-Reply-To: <1046833360.2757.43.camel@mulgrave>; from James.Bottomley@steeleye.com on Wed, Mar 05, 2003 at 04:02:38AM +0100

On Wed, Mar 05, 2003 at 04:02:38AM +0100, James Bottomley wrote:
> 
> Could you elaborate on why a pending_queue (which duplicates some of the
> block layer queueing functionality that we use) is a good idea.

> Under the current scheme, we prep one command beyond the can_queue limit
> and leave it in the block queue, so the returning commands can restart
> with a fully prepped command but we still leave all the others in the
> block queue for potential elevator merging. 

Note that if we go over can_queue performance can suffer no matter what we
do in scsi core. If the bandwidth or a limit of the adapter is reached, no
changes in scsi core can fix that, all we can do is make sure each
scsi_device can do some IO. So, we are trying to figure out a good way to
make sure all devices can do IO when can_queue is hit.

(Not sure if you implied the following change) The host pending_cmd queue
could be replaced in the future with a (block) request queue for each
LLDD, without much change in function - we would still have to pull
requests off of the scsi_device queue before putting them into any LLDD
request queue, so we still would not be able to leave requests in the
scsi_device queue.  We could try to "sort" the LLDD queue so we have a mix
of scsi_devices represented, but that could lead to other issues.

Going to a block request queue now might be hard - it would likely need a
further separation of scsi_device and scsi_host within scsi core (in the
request and prep functions, and in the IO completion path).

With multiple starved devices with IO requests pending for all of them,
the algorithm we have now (assuming it worked right) can unfairly allow
each scsi_device to have as many commands outstanding as it did when we
hit the starved state.

The current algorithm could be fixed and throttling added.

Today, many of the host adapter drivers avoid the can_queue issue by
limiting the queue_depth (such as ips.c), or by having their own software
queue (qlogic's qla driver).

Pros of a host pending_cmd queue versus throttling across (effectively) a
queue of queues:

1) Simpler (and perhaps faster) code.

2) We don't need any throttling, and can always keep can_queue IO's in
flight. With per-host throttling, we sometimes must keep the number of
IO's in flight below can_queue in order to be fair (in cases where one
device is busier than another; there could be a way around this, but I
haven't figured it out).

3) The queue_depth setting can be used to limit IO across scsi_devices on
a single adapter - if you are hitting can_queue, setting a lower
queue_depth for some devices will affectively lower the bandwidth
available to them. (Given we can modify these on a per scsi_device basis;
I would really like a writable sysfs sdev->queue_depth).

4) Allows better separation of scsi_device code from scsi_host, for
example, it simplifies going to a per-device queue lock.

Cons:

1) Does not leave leave devices on the block queue so they can be
merged/sorted with incoming commands.

2) Might lead to grouping of IO per scsi_device rather than spreading IO
across all devices. Hopefully this would balance out over the long term or
not degrade performance at all (for sequential IO it might speed things
up, but for random it might slow down), and over a long period and with
random enough (across scsi_device queues, not within a given queue) IO
patterns, we would end up spreading IO across all queues.

Given that we are talking about what to do when a hardware limit is
reached (can_queue) I would rather go with the simpler approach.

-- Patrick Mansfield

  reply	other threads:[~2003-03-05 18:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-02-28 19:19 [RFC][PATCH] scsi-misc-2.5 software enqueue when can_queue reached Patrick Mansfield
2003-03-02  8:57 ` Christoph Hellwig
2003-03-02 18:15   ` Patrick Mansfield
2003-03-03 15:52   ` Randy.Dunlap
2003-03-03 18:17   ` Luben Tuikov
2003-03-04  1:11     ` Andrew Morton
2003-03-04  4:49       ` Luben Tuikov
2003-03-02 20:57 ` Luben Tuikov
2003-03-02 21:08   ` Luben Tuikov
2003-03-03 20:52   ` Patrick Mansfield
2003-03-03 22:40     ` Luben Tuikov
2003-03-03 23:41       ` Patrick Mansfield
2003-03-04  5:48         ` Luben Tuikov
2003-03-05  3:02 ` James Bottomley
2003-03-05 18:43   ` Patrick Mansfield [this message]
2003-03-06 15:57     ` James Bottomley
2003-03-06 17:41       ` Patrick Mansfield
2003-03-06 18:04         ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030305104320.A14722@beaverton.ibm.com \
    --to=patmans@us.ibm.com \
    --cc=James.Bottomley@steeleye.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox