From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: Bart Van Assche <bvanassche@acm.org>,
Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Paolo Valente <paolo.valente@linaro.org>,
Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
Hannes Reinecke <hare@suse.de>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH v3 01/18] block: introduce duration-limits priority class
Date: Sat, 28 Jan 2023 09:40:37 +0900 [thread overview]
Message-ID: <049a7e88-89d1-804f-a0b5-9e5d93d505f7@opensource.wdc.com> (raw)
In-Reply-To: <ddc88fa1-5aaa-4123-e43b-18dc37f477e9@acm.org>
On 1/28/23 02:23, Bart Van Assche wrote:
> A summary of my concerns is as follows:
> * The current I/O priority levels (RT, BE, IDLE) apply to all block
> devices. IOPRIO_CLASS_DL is only supported by certain block devices
> (some but not all SCSI harddisks). This forces applications to check the
> capabilities of the storage device before it can be decided whether or
> not IOPRIO_CLASS_DL can be used. This is not something applications
> should do but something the kernel should do. Additionally, if multiple
> dm devices are stacked on top of the block device driver, like in
> Android, it becomes even more cumbersome to check whether or not the
> block device supports CDL.
Yes, RT, BE and IDLE apply to all block devices. And so does CDL in the sense
that if a user specifies the CDL class for IOs to a device that does not support
CDL, then nothing special will happen. There will be no differentiation of the
IOs. That *exactly* what happens when using RT, BE or IDLE with the none
scheduler (e.g. default nvme setup). And the same remark applies to RT class
mapping to ATA NCQ priority feature: the user needs to check the device to know
if that will happen, *and* also needs to turn on that feature for that mapping
to be effective.
The levels of the CDL priority class are also very well defined: they map to the
CDL descriptors defined on the drive, which are consultable by the user through
sysfs (no special tools needed), so easily discoverable.
As for DM devices, these have no scheduler. So any processing of a priority
class by a DM target driver is that driver responsibility. Initially, all that
happens is the block layer passing on that information through the stack with
the BIOs. That's it. Real action may happen once the physical block device is
reached with the IO scheduler for that device, if one is set.
At that level, none scheduler is of no concern, nothing will happen. Kyber also
ignores priorities. We are left with only bfq and mq-deadline. The latter only
cares about the priority class, ignoring levels. bfq does act on both class and
level.
IOPRIO_CLASS_DL is equal to 4, so strictly speaking, is of lower priority than
the IDLE class if you want to consider it as part of that ordering. But we
defined it as a different class to allow *not* having to do that. IO schedulers
can be modified to ignore that priority class for now, mapping it to say the
default BE class for instance. Our current patch set maps the CDL class to the
RT class for the schedulers, as that made most sense given the time-sensitive
nature of CDL workloads. But we can change that to actually let the scheduler
decide if you want. There are no other changes in the block layer that have or
need special handling of the CDL class. All very clean in my opinion, no special
conditions for that feature. No additional "if" in the hot path, no overhead added.
> * For the RT, BE and IDLE classes, it is well defined which priority
> number represents a high priority and which priority number represents a
> low priority. For CDL, only the drive knows the priority details. I
> think that application software should be able to select a DL priority
> without having to read the CDL configuration first.
The levels of the CDL priority class are also very well defined: they map to the
CDL descriptors defined on the drive, which are consultable by the user through
sysfs (no special tools needed), so easily discoverable. And unless we restrict
how CDL descriptors can be defined, which I explained in my previous email is
not desirable at all, we cannot and should not try to order levels in some sort
of priority semantic. CDL semantic does not define directly a priority level,
only time limits, which may or may not be ordered, depending on the limits
definitions.
As Niklas pointed out, this is not a "generic" feature that any random
application can magically use without modifications. The application must be
aware of what CDL is and if how the descriptors are. And for 99.99% of the use
cases, the CDL descriptors will be defined in a way usefull for that
application. There is no magic generic set of descriptors defined by default.
Though a simple set of increasing time limits that can be cleanly mapped to
priority levels. A system administrator is free to do that for the system drives
if that is what the running applications expect. CDL is a very flexible feature
that can cover a lot of use cases. Trying to shoehorn in into the legacy/classic
priority semantic framework would only restrict its usefulness.
> I hope that I have it made it clear that I think that the proposed user
> space API will be very painful to use for application developers.
I completely disagree. Reusing the prio class/level API made it easy to allow
applications to use the feature. fio support for CDL requires exactly *one line*
change, to allow for the CDL class number 4. That's it. From there, one can use
the --cmdprio_class=4 nd --cmdprio=idx options to exercise a drive. The value of
"idx" here of course depends on how the descriptors are set on the drive. But
back to the point above. This depends on the application goals and the
descriptors are set accordingly for that goal. There is no real discovery needed
by the application. The application expect a certain set of CDL limits for its
use case, and checking that this set is the one currently defined on the drive
is easy to do from an application with the sysfs interface we added.
Many users out there have deployed and using applications taking advantage of
ATA NCQ priority feature, using class RT for high priority IOs. The new CDL
class does not require many application changes to be enabled for next gen
drives that will have CDL.
>
> Bart.
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2023-01-28 0:40 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-24 19:02 [PATCH v3 00/18] Add Command Duration Limits support Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 01/18] block: introduce duration-limits priority class Niklas Cassel
2023-01-24 19:27 ` Bart Van Assche
2023-01-24 20:36 ` Bart Van Assche
2023-01-24 21:48 ` Damien Le Moal
2023-01-24 21:29 ` Damien Le Moal
2023-01-24 22:43 ` Bart Van Assche
2023-01-24 22:59 ` Damien Le Moal
2023-01-25 0:05 ` Bart Van Assche
2023-01-25 1:19 ` Damien Le Moal
2023-01-25 18:37 ` Bart Van Assche
2023-01-25 21:23 ` Niklas Cassel
2023-01-26 0:24 ` Damien Le Moal
2023-01-26 13:53 ` Niklas Cassel
2023-01-26 17:33 ` Bart Van Assche
2023-01-27 0:18 ` Damien Le Moal
2023-01-27 1:40 ` Damien Le Moal
2023-01-27 17:23 ` Bart Van Assche
2023-01-28 0:40 ` Damien Le Moal [this message]
2023-01-28 0:47 ` Bart Van Assche
2023-01-28 0:59 ` Damien Le Moal
2023-01-28 20:25 ` Martin K. Petersen
2023-01-29 3:52 ` Damien Le Moal
2023-01-30 13:44 ` Hannes Reinecke
2023-01-30 14:55 ` Damien Le Moal
2023-01-30 19:24 ` Bart Van Assche
2023-01-30 20:40 ` Bart Van Assche
2023-01-31 2:49 ` Martin K. Petersen
2023-01-31 3:10 ` Damien Le Moal
2023-01-30 19:13 ` Bart Van Assche
2023-01-31 2:58 ` Martin K. Petersen
2023-01-31 3:03 ` Damien Le Moal
2023-01-25 23:11 ` Keith Busch
2023-01-26 0:08 ` Damien Le Moal
2023-01-26 5:26 ` Christoph Hellwig
2023-01-25 6:33 ` Christoph Hellwig
2023-01-27 12:43 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 02/18] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-01-24 19:29 ` Bart Van Assche
2023-01-24 19:59 ` Keith Busch
2023-01-24 20:32 ` Bart Van Assche
2023-01-24 21:39 ` Damien Le Moal
2023-01-24 21:36 ` Damien Le Moal
2023-01-24 21:34 ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 03/18] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 04/18] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-01-24 19:32 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 05/18] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-01-24 19:34 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 06/18] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-01-24 19:36 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 07/18] scsi: sd: detect support for command duration limits Niklas Cassel
2023-01-24 19:39 ` Bart Van Assche
2023-01-27 13:00 ` Hannes Reinecke
2023-01-28 0:51 ` Damien Le Moal
2023-01-28 2:52 ` Bart Van Assche
2023-01-29 2:05 ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 08/18] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-01-27 15:30 ` Hannes Reinecke
2023-01-28 0:03 ` Damien Le Moal
2023-01-30 18:13 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 09/18] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-01-27 15:34 ` Hannes Reinecke
2023-01-28 0:06 ` Damien Le Moal
2023-02-03 16:49 ` Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 10/18] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-01-27 15:35 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 11/18] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-01-27 15:36 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 12/18] ata: libata: detect support for command duration limits Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 13/18] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-01-27 15:37 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 14/18] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-01-27 15:38 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 15/18] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-01-27 15:40 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 16/18] ata: libata: set read/write commands CDL index Niklas Cassel
2023-01-27 15:41 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 17/18] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-01-27 15:43 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 18/18] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2023-01-27 15:43 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=049a7e88-89d1-804f-a0b5-9e5d93d505f7@opensource.wdc.com \
--to=damien.lemoal@opensource.wdc.com \
--cc=Niklas.Cassel@wdc.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=paolo.valente@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox