From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: Bart Van Assche <bvanassche@acm.org>,
Niklas Cassel <niklas.cassel@wdc.com>,
Paolo Valente <paolo.valente@linaro.org>,
Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
linux-block@vger.kernel.org
Subject: Re: [PATCH v3 01/18] block: introduce duration-limits priority class
Date: Wed, 25 Jan 2023 07:59:46 +0900 [thread overview]
Message-ID: <4c345d8b-7efa-85c9-fe1c-1124ea5d9de6@opensource.wdc.com> (raw)
In-Reply-To: <873e0213-94b5-0d81-a8aa-4671241e198c@acm.org>
On 1/25/23 07:43, Bart Van Assche wrote:
> On 1/24/23 13:29, Damien Le Moal wrote:
>> I/O priority at the device level does not exist with SAS and with SATA,
>> the ACS specifications mandates that NCQ I/O priority and CDL cannot be
>> used mixed together. So from the device point of view, I/O priority and
>> CDL are mutually exclusive. No issues.
>>
>> Now, if you are talking about the host level I/O priority scheduling done
>> by mq-deadline and bfq, the CDL priority class maps to the RT class. They
>> are the same, as they should. There is nothing more real-time than CDL in
>> my opinion :)
>>
>> Furthermore, if we do not reuse the I/O priority interface, we will have
>> to add another field to BIOs & requests to propagate the cdl index from
>> user space down to the device LLD, almost exactly in the manner of I/O
>> priorities, including all the controls with merging etc. That would be a
>> lot of overhead to achieve the possibility of prioritized CDL commands.
>>
>> CDL in of itself allows the user to define "prioritized" commands by
>> defining CDLs on the drive that are sorted in increasing time limit order,
>> i.e. with low CDL index numbers having low limits, and higher priority
>> within the class (as CDL index == prio level). With that, schedulers can
>> still do the right thing as they do now, with the additional benefit that
>> they can even be improved to base their scheduling decisions on a known
>> time limit for the command execution. But such optimization is not
>> implemented by this series.
>
> Hi Damien,
>
> What if a device that supports I/O priorities (e.g. an NVMe device that
> supports configuring the SQ priority) and a device that supports command
> duration limits (e.g. a SATA hard disk) are combined via the device
> mapper into a single block device? Should I/O be submitted to the dm
> device with one of the existing I/O priority classes (not supported by
> SATA hard disks) or with I/O priority class IOPRIO_CLASS_DL (not
> supported by NVMe devices)?
That is not a use case we considered. My gut feeling is that this is
something the target driver should handle when processing a user IO.
Note that I was not aware that Linux NVMe driver supported queue priorities...
> Shouldn't the ATA core translate the existing I/O priority levels into a
> command duration limit instead of introducing a new I/O priority class
> that is only supported by ATA devices?
There is only one priority class that ATA understands: RT (the level is
irrelevant and ignored). All RT class IOs are mapped to high priority NCQ
commands. All other classes map to normal priority (no priority bit set)
commands.
And sure, we could map the level of RT class IOs to a CDL index, as we do
for the CDL class, but what would be the point ? The user should use the
CDL class in that case.
Furthermore, there is one additional thing that we do not yet support but
will later: CDL descriptor 0 can be used to set a target time limit for
high priority NCQ commands. Without this new feature introduced with CDL,
the drive is free to schedule high priority NCQ commands as it wants, and
that is hard coded in FW. So you can endup with very aggressive scheduling
leading to significant overall IOPS drop and long tail latency for low
priority commands. See page 11 and 20 of this presentation for an example:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf
For a drive that supports both CDL and NCQ priority, with CDL feature
turned off, CDL descriptor 0 defines the time limit hint for high priority
NCQ commands. Again, CDL and NCQ high priority are mutually exclusive.
So for clarity, I really would prefer separating CDL and RT classes as we
did. We could integrate CDL support reusing the RT class + level for CDL
index, but I think this may be very confusing for users, especially
considering that the CLDs on a drive can be defined in any order the user
wants, resulting in indexes/levels that does do not have any particular
order, making it impossible for the host to correctly schedule commands.
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-01-24 22:59 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-24 19:02 [PATCH v3 00/18] Add Command Duration Limits support Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 01/18] block: introduce duration-limits priority class Niklas Cassel
2023-01-24 19:27 ` Bart Van Assche
2023-01-24 20:36 ` Bart Van Assche
2023-01-24 21:48 ` Damien Le Moal
2023-01-24 21:29 ` Damien Le Moal
2023-01-24 22:43 ` Bart Van Assche
2023-01-24 22:59 ` Damien Le Moal [this message]
2023-01-25 0:05 ` Bart Van Assche
2023-01-25 1:19 ` Damien Le Moal
2023-01-25 18:37 ` Bart Van Assche
2023-01-25 21:23 ` Niklas Cassel
2023-01-26 0:24 ` Damien Le Moal
2023-01-26 13:53 ` Niklas Cassel
2023-01-26 17:33 ` Bart Van Assche
2023-01-27 0:18 ` Damien Le Moal
2023-01-27 1:40 ` Damien Le Moal
2023-01-27 17:23 ` Bart Van Assche
2023-01-28 0:40 ` Damien Le Moal
2023-01-28 0:47 ` Bart Van Assche
2023-01-28 0:59 ` Damien Le Moal
2023-01-28 20:25 ` Martin K. Petersen
2023-01-29 3:52 ` Damien Le Moal
2023-01-30 13:44 ` Hannes Reinecke
2023-01-30 14:55 ` Damien Le Moal
2023-01-30 19:24 ` Bart Van Assche
2023-01-30 20:40 ` Bart Van Assche
2023-01-31 2:49 ` Martin K. Petersen
2023-01-31 3:10 ` Damien Le Moal
2023-01-30 19:13 ` Bart Van Assche
2023-01-31 2:58 ` Martin K. Petersen
2023-01-31 3:03 ` Damien Le Moal
2023-01-25 23:11 ` Keith Busch
2023-01-26 0:08 ` Damien Le Moal
2023-01-26 5:26 ` Christoph Hellwig
2023-01-25 6:33 ` Christoph Hellwig
2023-01-27 12:43 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 02/18] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-01-24 19:29 ` Bart Van Assche
2023-01-24 19:59 ` Keith Busch
2023-01-24 20:32 ` Bart Van Assche
2023-01-24 21:39 ` Damien Le Moal
2023-01-24 21:36 ` Damien Le Moal
2023-01-24 21:34 ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 03/18] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 04/18] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-01-24 19:32 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 05/18] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-01-24 19:34 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 06/18] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-01-24 19:36 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 07/18] scsi: sd: detect support for command duration limits Niklas Cassel
2023-01-24 19:39 ` Bart Van Assche
2023-01-27 13:00 ` Hannes Reinecke
2023-01-28 0:51 ` Damien Le Moal
2023-01-28 2:52 ` Bart Van Assche
2023-01-29 2:05 ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 08/18] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-01-27 15:30 ` Hannes Reinecke
2023-01-28 0:03 ` Damien Le Moal
2023-01-30 18:13 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 09/18] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-01-27 15:34 ` Hannes Reinecke
2023-01-28 0:06 ` Damien Le Moal
2023-02-03 16:49 ` Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 10/18] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-01-27 15:35 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 11/18] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-01-27 15:36 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 12/18] ata: libata: detect support for command duration limits Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 13/18] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-01-27 15:37 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 14/18] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-01-27 15:38 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 15/18] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-01-27 15:40 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 16/18] ata: libata: set read/write commands CDL index Niklas Cassel
2023-01-27 15:41 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 17/18] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-01-27 15:43 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 18/18] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2023-01-27 15:43 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c345d8b-7efa-85c9-fe1c-1124ea5d9de6@opensource.wdc.com \
--to=damien.lemoal@opensource.wdc.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=niklas.cassel@wdc.com \
--cc=paolo.valente@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox