From: Ming Lei <ming.lei@redhat.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
"Ewan D. Milne" <emilne@redhat.com>,
Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
"James E . J . Bottomley" <jejb@linux.ibm.com>,
linux-scsi@vger.kernel.org,
Sathya Prakash <sathya.prakash@broadcom.com>,
Chaitra P B <chaitra.basappa@broadcom.com>,
Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
Christoph Hellwig <hch@lst.de>,
Bart Van Assche <bart.vanassche@wdc.com>
Subject: Re: [PATCH 4/4] scsi: core: don't limit per-LUN queue depth for SSD
Date: Fri, 22 Nov 2019 11:24:32 +0800 [thread overview]
Message-ID: <20191122032432.GB903@ming.t460p> (raw)
In-Reply-To: <yq1pnhkbopi.fsf@oracle.com>
Hi Martin,
On Thu, Nov 21, 2019 at 09:59:53PM -0500, Martin K. Petersen wrote:
>
> Ming,
>
> > I don't understand the motivation of ramp-up/ramp-down, maybe it is just
> > for fairness among LUNs.
>
> Congestion control. Devices have actual, physical limitations that are
> different from the tag context limitations on the HBA. You don't have
> that problem on NVMe because (at least for PCIe) the storage device and
> the controller are one and the same.
>
> If you submit 100000 concurrent requests to a SCSI drive that does 100
> IOPS, some requests will time out before they get serviced.
> Consequently we have the ability to raise and lower the queue depth to
> constrain the amount of requests in flight to a given device at any
> point in time.
blk-mq has already puts a limit on each LUN, the number is
host_queue_depth / nr_active_LUNs, see hctx_may_queue().
Looks this way works for NVMe, that is why I try to bypass
.device_busy for SSD which is too expensive on fast storage. Even
Hannes wants to kill it completely.
>
> Also, devices use BUSY/QUEUE_FULL/TASK_SET_FULL to cause the OS to back
> off. We frequently see issues where the host can submit burst I/O much
> faster than the device can de-stage from cache. In that scenario the
> device reports BUSY/QF/TSF and we will back off so the device gets a
> chance to recover. If we just let the application submit new I/O without
> bounds, the system would never actually recover.
>
> Note that the actual, physical limitations for how many commands a
> target can handle are typically much, much lower than the number of tags
> the HBA can manage. SATA devices can only express 32 concurrent
> commands. SAS devices typically 128 concurrent commands per
> port. Arrays differ.
I understand SATA's host queue depth is set as 32.
But SAS HBA's queue depth is often big, so do we reply on .device_busy for
throttling requests to SAS?
>
> If we ignore the RAID controller use case where the controller
> internally queues and arbitrates commands between many devices, how is
> submitting 1000 concurrent requests to a device which only has 128
> command slots going to work?
For SSD, I guess it might be fine, given NVMe sets per-hw-queue depth
as 1023 usually. That means the concurrent requests can be as many as
1023 * nr_hw_queues in case of single namespace.
>
> Some HBAs have special sauce to manage BUSY/QF/TSF, some don't. If we
> blindly stop restricting the number of I/Os in flight in the ML, we may
> exceed either the capabilities of what the transport protocol can
> express or internal device resources.
OK, one conservative approach may be just to just bypass .device_busy
in case of SSD only for some high end HBA.
Or maybe we can wire up sdev->queue_depth with block layer's scheduler
queue depth? One issue is that sdev->queue_depth may be updated some
times.
Thanks,
Ming
next prev parent reply other threads:[~2019-11-22 3:24 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-18 10:31 [PATCH 0/4] scis: don't apply per-LUN queue depth for SSD Ming Lei
2019-11-18 10:31 ` [PATCH 1/4] scsi: megaraid_sas: use private counter for tracking inflight per-LUN commands Ming Lei
2019-11-20 9:54 ` Hannes Reinecke
2019-11-26 3:12 ` Kashyap Desai
2019-11-26 3:37 ` Ming Lei
2019-12-05 10:32 ` Kashyap Desai
2019-11-18 10:31 ` [PATCH 2/4] scsi: mpt3sas: " Ming Lei
2019-11-20 9:55 ` Hannes Reinecke
2019-11-18 10:31 ` [PATCH 3/4] scsi: sd: register request queue after sd_revalidate_disk is done Ming Lei
2019-11-20 9:59 ` Hannes Reinecke
2019-11-18 10:31 ` [PATCH 4/4] scsi: core: don't limit per-LUN queue depth for SSD Ming Lei
2019-11-20 10:05 ` Hannes Reinecke
2019-11-20 17:00 ` Ewan D. Milne
2019-11-20 20:56 ` Bart Van Assche
2019-11-20 21:36 ` Ewan D. Milne
2019-11-22 2:25 ` Martin K. Petersen
2019-11-21 1:07 ` Ming Lei
2019-11-22 2:59 ` Martin K. Petersen
2019-11-22 3:24 ` Ming Lei [this message]
2019-11-22 16:38 ` Sumanesh Samanta
2019-11-21 0:08 ` Sumanesh Samanta
2019-11-21 0:54 ` Ming Lei
2019-11-21 19:19 ` Ewan D. Milne
2019-11-21 0:53 ` Ming Lei
2019-11-21 15:45 ` Hannes Reinecke
2019-11-22 8:09 ` Ming Lei
2019-11-22 18:14 ` Bart Van Assche
2019-11-22 18:26 ` James Smart
2019-11-22 20:46 ` Bart Van Assche
2019-11-22 22:04 ` Ming Lei
2019-11-22 22:00 ` Ming Lei
2019-11-25 18:28 ` Ewan D. Milne
2019-11-25 22:14 ` James Smart
2019-11-22 2:18 ` Martin K. Petersen
-- strict thread matches above, loose matches on Subject: below --
2019-11-20 21:58 Sumanesh Samanta
2019-11-21 1:21 ` Ming Lei
2019-11-21 1:50 ` Sumanesh Samanta
2019-11-21 2:23 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191122032432.GB903@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=bart.vanassche@wdc.com \
--cc=bvanassche@acm.org \
--cc=chaitra.basappa@broadcom.com \
--cc=emilne@redhat.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jejb@linux.ibm.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=sathya.prakash@broadcom.com \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=suganath-prabu.subramani@broadcom.com \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox