public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <bvanassche@acm.org>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Paolo Valente <paolo.valente@linaro.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Hannes Reinecke <hare@suse.de>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH v3 01/18] block: introduce duration-limits priority class
Date: Fri, 27 Jan 2023 09:23:23 -0800	[thread overview]
Message-ID: <ddc88fa1-5aaa-4123-e43b-18dc37f477e9@acm.org> (raw)
In-Reply-To: <c8ef76be-c285-c797-5bdb-3a960821048b@opensource.wdc.com>

On 1/26/23 17:40, Damien Le Moal wrote:
> On 1/27/23 09:18, Damien Le Moal wrote:
>> On 1/27/23 02:33, Bart Van Assche wrote:
>>> How about only supporting a subset of the standard such that it becomes
>>> easy to map CDLs to host side priority levels?
>>
>> I am opposed to this, for several reasons:
>>
>> 1) We are seeing different use cases from users that cover a wide range of
>> use of CDL descriptors with various definitions.
>>
>> 2) Passthrough commands can be used by a user to change a drive CDL
>> descriptors without the kernel knowing about it, unless we spend our time
>> revalidating the CDL descriptor log page(s)...
>> 3) CDL standard as is is actually very sensible and not overloaded with
>> stuff that is only useful in niche use cases. For each CDL descriptor, you
>> have:
>>   * The active time limit, which is a clean way to specify how much time
>> you allow a drive to deal with bad sectors (mostly read case). A typical
>> HDD will try very hard to recover data from a sector, always. As a result,
>> the HDD may spend up to several seconds reading a sector again and again
>> applying different signal processing techniques until it gets the sector
>> ECC checked to return valid data. That of course can hugely increase an IO
>> latency seen by the host. In applications such as erasure coded
>> distributed object stores, maximum latency for an object access can thus
>> be kept low using this limit without compromising the data since the
>> object can always be rebuilt from the erasure codes if one HDD is slow to
>> respond. This limit is also interesting for video streaming/playback to
>> avoid video buffer underflow (at the expense of may be some block noise
>> depending on the codec).
>>   * The inactive time limit can be used to tell the drive how long it is
>> allowed to let a command stand in the drive internal queue before
>> processing. This is thus a parameter that allows a host to tune the drive
>> RPO optimization (rotational positioning optimization, e.g. HDD internal
>> command scheduling based on angular sector position on tracks withe the
>> head current position). This is a neat way to control max IOPS vs tail
>> latency since drives tend to privilege maximizing IOPS over lowering max
>> tail latency.
>>   * The duration guideline limit defines an overall time limit for a
>> command without distinguishing between active and inactive time. It is the
>> easiest to use (the easiest one to understand from a beginner user point
>> of view). This is a neat way to define an intelligent IO prioritization in
>> fact, way better than RT class scheduling on the host or the use of ATA
>> NCQ high priority, as it provides more information to the drive about the
>> urgency of a particular command. That allows the drive to still perform
>> RPO to maximize IOPS without long tail latencies. Chaining such limit with
>> an active+inactive time limit descriptor using the "next limit" policy
>> (0x1 policy) can also finely define what the drive should if the guideline
>> limit is exceeded (as the next descriptor can define what to do based on
>> the reason for the limit being exceeded: long internal queueing vs bad
>> sector long access time).
> 
> Note that all 3 limits can be used in a single CDL descriptor to precisely
> define how a command should be processed by the device. That is why it is
> nearly impossible to come up with a meaningful ordering of CDL descriptors
> as an increasing set of priority levels.

A summary of my concerns is as follows:
* The current I/O priority levels (RT, BE, IDLE) apply to all block 
devices. IOPRIO_CLASS_DL is only supported by certain block devices 
(some but not all SCSI harddisks). This forces applications to check the 
capabilities of the storage device before it can be decided whether or 
not IOPRIO_CLASS_DL can be used. This is not something applications 
should do but something the kernel should do. Additionally, if multiple 
dm devices are stacked on top of the block device driver, like in 
Android, it becomes even more cumbersome to check whether or not the 
block device supports CDL.
* For the RT, BE and IDLE classes, it is well defined which priority 
number represents a high priority and which priority number represents a 
low priority. For CDL, only the drive knows the priority details. I 
think that application software should be able to select a DL priority 
without having to read the CDL configuration first.

I hope that I have it made it clear that I think that the proposed user 
space API will be very painful to use for application developers.

Bart.


  reply	other threads:[~2023-01-27 17:23 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24 19:02 [PATCH v3 00/18] Add Command Duration Limits support Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 01/18] block: introduce duration-limits priority class Niklas Cassel
2023-01-24 19:27   ` Bart Van Assche
2023-01-24 20:36     ` Bart Van Assche
2023-01-24 21:48       ` Damien Le Moal
2023-01-24 21:29     ` Damien Le Moal
2023-01-24 22:43       ` Bart Van Assche
2023-01-24 22:59         ` Damien Le Moal
2023-01-25  0:05           ` Bart Van Assche
2023-01-25  1:19             ` Damien Le Moal
2023-01-25 18:37               ` Bart Van Assche
2023-01-25 21:23                 ` Niklas Cassel
2023-01-26  0:24                   ` Damien Le Moal
2023-01-26 13:53                     ` Niklas Cassel
2023-01-26 17:33                       ` Bart Van Assche
2023-01-27  0:18                         ` Damien Le Moal
2023-01-27  1:40                           ` Damien Le Moal
2023-01-27 17:23                             ` Bart Van Assche [this message]
2023-01-28  0:40                               ` Damien Le Moal
2023-01-28  0:47                                 ` Bart Van Assche
2023-01-28  0:59                                   ` Damien Le Moal
2023-01-28 20:25                                     ` Martin K. Petersen
2023-01-29  3:52                                       ` Damien Le Moal
2023-01-30 13:44                                         ` Hannes Reinecke
2023-01-30 14:55                                           ` Damien Le Moal
2023-01-30 19:24                                         ` Bart Van Assche
2023-01-30 20:40                                         ` Bart Van Assche
2023-01-31  2:49                                         ` Martin K. Petersen
2023-01-31  3:10                                           ` Damien Le Moal
2023-01-30 19:13                                       ` Bart Van Assche
2023-01-31  2:58                                         ` Martin K. Petersen
2023-01-31  3:03                                           ` Damien Le Moal
2023-01-25 23:11               ` Keith Busch
2023-01-26  0:08                 ` Damien Le Moal
2023-01-26  5:26                 ` Christoph Hellwig
2023-01-25  6:33         ` Christoph Hellwig
2023-01-27 12:43   ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 02/18] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-01-24 19:29   ` Bart Van Assche
2023-01-24 19:59     ` Keith Busch
2023-01-24 20:32       ` Bart Van Assche
2023-01-24 21:39         ` Damien Le Moal
2023-01-24 21:36       ` Damien Le Moal
2023-01-24 21:34     ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 03/18] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 04/18] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-01-24 19:32   ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 05/18] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-01-24 19:34   ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 06/18] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-01-24 19:36   ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 07/18] scsi: sd: detect support for command duration limits Niklas Cassel
2023-01-24 19:39   ` Bart Van Assche
2023-01-27 13:00   ` Hannes Reinecke
2023-01-28  0:51     ` Damien Le Moal
2023-01-28  2:52       ` Bart Van Assche
2023-01-29  2:05         ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 08/18] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-01-27 15:30   ` Hannes Reinecke
2023-01-28  0:03     ` Damien Le Moal
2023-01-30 18:13       ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 09/18] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-01-27 15:34   ` Hannes Reinecke
2023-01-28  0:06     ` Damien Le Moal
2023-02-03 16:49     ` Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 10/18] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-01-27 15:35   ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 11/18] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-01-27 15:36   ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 12/18] ata: libata: detect support for command duration limits Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 13/18] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-01-27 15:37   ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 14/18] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-01-27 15:38   ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 15/18] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-01-27 15:40   ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 16/18] ata: libata: set read/write commands CDL index Niklas Cassel
2023-01-27 15:41   ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 17/18] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-01-27 15:43   ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 18/18] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2023-01-27 15:43   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddc88fa1-5aaa-4123-e43b-18dc37f477e9@acm.org \
    --to=bvanassche@acm.org \
    --cc=Niklas.Cassel@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox