From: Bart Van Assche <bvanassche@acm.org>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Niklas Cassel <niklas.cassel@wdc.com>,
Paolo Valente <paolo.valente@linaro.org>,
Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>, Hannes Reinecke <hare@suse.de>,
linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
linux-block@vger.kernel.org
Subject: Re: [PATCH v3 01/18] block: introduce duration-limits priority class
Date: Wed, 25 Jan 2023 10:37:52 -0800 [thread overview]
Message-ID: <e8324901-7c18-153f-b47f-112a394832bd@acm.org> (raw)
In-Reply-To: <275993f1-f9e8-e7a8-e901-2f7d3a6bb501@opensource.wdc.com>
On 1/24/23 17:19, Damien Le Moal wrote:
> On 1/25/23 09:05, Bart Van Assche wrote:
>> Thanks again for the detailed reply. Your replies are very informative
>> and help me understand the context better.
>>
>> However, I'm still less than enthusiast about the introduction of the
>> I/O priority class IOPRIO_CLASS_DL. To me command duration limits (CDL)
>> is a mechanism that is supported by one storage standard (SCSI) and of
>
> And ATA (ACS) too. Not just SCSI. This is actually an improvement over IO
> priority (command priority) that is supported only by ATA NCQ and does not
> exist with SCSI/SBC.
>
>> which it is not sure that it will be integrated in other storage
>> standards (NVMe, ...). Isn't the purpose of the block layer to provide
>> an interface that is independent of the specifics of a single storage
>> standard? This is why I'm in favor of letting the ATA core translate one
>> of the existing I/O priority classes into a CDL instead of introducing a
>> new I/O priority class (IOPRIO_CLASS_DL) in the block layer.
>
> We discussed CDL with Hannes in the context of NVMe over fabrics. Their
> may be interesting extensions to consider for NVMe in that context (the
> value for local PCI attached NVMe drive is more limited at best).
>
> I would argue that IO priority is the same: that is not supported by all
> device classes either, and for those that support it, the semantic is not
> identical (ATA vs NVMe). Yet, we have the RT class that maps to high
> priority for ATA, and nothing else as far as I know.
>
> CDL at least covers SCSI *and* ATA, and as mentioned above, could be used
> by NVMe-of host drivers to do fancy link selection for a multipath setup
> based on the link speed for instance.
>
> We could overload the RT class with a mapping to CDL feature on scsi and
> ata, but I think this is more confusing/messy than a separate class as we
> implemented.
Hi Damien,
The more I think about this, the more I'm convinced that it would be wrong
to introduce IOPRIO_CLASS_DL. Datacenters will have a mix of drives that
support CDL and drives that do not support CDL. It seems wrong to me to
make user space software responsible for figuring out whether or not the
drive supports CDL before it can be decided which I/O priority class should
be used. This is something the kernel should do instead of user space
software.
If we would ask Android storage vendors to implement CDL then IOPRIO_CLASS_DL
would cause trouble too. Android has support since considerable time to give
the foreground application a higher I/O priority than background applications.
The cgroup settings for foreground and background applications come from the
task_profiles.json file (see also
https://android.googlesource.com/platform/system/core/+/master/libprocessgroup/profiles/task_profiles.json).
As one can see all the settings in that file are independent of the features
of the storage device. Introducing IOPRIO_CLASS_DL in the kernel and using it
in task_profiles.json would imply that the storage device type has to be
determined before it can be decided whether or not IOPRIO_CLASS_DL can be used.
This seems wrong to me.
I downloaded the patch series in its entirety and applied it on a local kernel
branch. I verified which changes would be needed to replace IOPRIO_CLASS_DL
with IOPRIO_CLASS_RT. Can you help me with verifying the patch below?
Regarding the BFQ changes in the patch below, is an I/O scheduler useful at all
if CDL is used since a comment in the BFQ driver says that the disk should do
the scheduling instead of BFQ if CDL is used?
Thanks,
Bart.
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7add9346c585..815b884d6c5a 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5545,14 +5545,6 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE;
bfqq->new_ioprio = 7;
break;
- case IOPRIO_CLASS_DL:
- /*
- * For the duration-limits class, we want the disk to do the
- * scheduling. So map all levels to the highest RT level.
- */
- bfqq->new_ioprio = 0;
- bfqq->new_ioprio_class = IOPRIO_CLASS_RT;
- break;
}
if (bfqq->new_ioprio >= IOPRIO_NR_LEVELS) {
@@ -5681,8 +5673,6 @@ static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
return &bfqg->async_bfqq[1][ioprio][act_idx];
case IOPRIO_CLASS_IDLE:
return &bfqg->async_idle_bfqq[act_idx];
- case IOPRIO_CLASS_DL:
- return &bfqg->async_bfqq[0][0][act_idx];
default:
return NULL;
}
diff --git a/block/blk-ioprio.c b/block/blk-ioprio.c
index dfb5c3f447f4..8bb6b8eba4ce 100644
--- a/block/blk-ioprio.c
+++ b/block/blk-ioprio.c
@@ -27,7 +27,6 @@
* @POLICY_RESTRICT_TO_BE: modify IOPRIO_CLASS_NONE and IOPRIO_CLASS_RT into
* IOPRIO_CLASS_BE.
* @POLICY_ALL_TO_IDLE: change the I/O priority class into IOPRIO_CLASS_IDLE.
- * @POLICY_ALL_TO_DL: change the I/O priority class into IOPRIO_CLASS_DL.
*
* See also <linux/ioprio.h>.
*/
@@ -36,7 +35,6 @@ enum prio_policy {
POLICY_NONE_TO_RT = 1,
POLICY_RESTRICT_TO_BE = 2,
POLICY_ALL_TO_IDLE = 3,
- POLICY_ALL_TO_DL = 4,
};
static const char *policy_name[] = {
@@ -44,7 +42,6 @@ static const char *policy_name[] = {
[POLICY_NONE_TO_RT] = "none-to-rt",
[POLICY_RESTRICT_TO_BE] = "restrict-to-be",
[POLICY_ALL_TO_IDLE] = "idle",
- [POLICY_ALL_TO_DL] = "duration-limits",
};
static struct blkcg_policy ioprio_policy;
diff --git a/block/ioprio.c b/block/ioprio.c
index 1b3a9da82597..32a456b45804 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -37,7 +37,6 @@ int ioprio_check_cap(int ioprio)
switch (class) {
case IOPRIO_CLASS_RT:
- case IOPRIO_CLASS_DL:
/*
* Originally this only checked for CAP_SYS_ADMIN,
* which was implicitly allowed for pid 0 by security
@@ -48,7 +47,7 @@ int ioprio_check_cap(int ioprio)
if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_NICE))
return -EPERM;
fallthrough;
- /* RT and DL have prio field too */
+ /* rt has prio field too */
case IOPRIO_CLASS_BE:
if (data >= IOPRIO_NR_LEVELS || data < 0)
return -EINVAL;
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 526d0ea4dbf9..f10c2a0d18d4 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -113,7 +113,6 @@ static const enum dd_prio ioprio_class_to_prio[] = {
[IOPRIO_CLASS_RT] = DD_RT_PRIO,
[IOPRIO_CLASS_BE] = DD_BE_PRIO,
[IOPRIO_CLASS_IDLE] = DD_IDLE_PRIO,
- [IOPRIO_CLASS_DL] = DD_RT_PRIO,
};
static inline struct rb_root *
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index b4761c3c4b91..3065b632e6ae 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -673,7 +673,7 @@ static inline void ata_set_tf_cdl(struct ata_queued_cmd *qc, int ioprio)
struct ata_taskfile *tf = &qc->tf;
int cdl;
- if (IOPRIO_PRIO_CLASS(ioprio) != IOPRIO_CLASS_DL)
+ if (IOPRIO_PRIO_CLASS(ioprio) != IOPRIO_CLASS_RT)
return;
cdl = IOPRIO_PRIO_DATA(ioprio) & 0x07;
diff --git a/drivers/scsi/sd_cdl.c b/drivers/scsi/sd_cdl.c
index 59d02dbb5ea1..c5286f5ddae4 100644
--- a/drivers/scsi/sd_cdl.c
+++ b/drivers/scsi/sd_cdl.c
@@ -880,10 +880,10 @@ int sd_cdl_dld(struct scsi_disk *sdkp, struct scsi_cmnd *scmd)
unsigned int dld;
/*
- * Use "no limit" if the request ioprio class is not IOPRIO_CLASS_DL
+ * Use "no limit" if the request ioprio class is not IOPRIO_CLASS_RT
* or if the user specified an invalid CDL descriptor index.
*/
- if (IOPRIO_PRIO_CLASS(ioprio) != IOPRIO_CLASS_DL)
+ if (IOPRIO_PRIO_CLASS(ioprio) != IOPRIO_CLASS_RT)
return 0;
dld = IOPRIO_PRIO_DATA(ioprio);
diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h
index 2f3fc2fbd668..7578d4f6a969 100644
--- a/include/linux/ioprio.h
+++ b/include/linux/ioprio.h
@@ -20,7 +20,7 @@ static inline bool ioprio_valid(unsigned short ioprio)
{
unsigned short class = IOPRIO_PRIO_CLASS(ioprio);
- return class > IOPRIO_CLASS_NONE && class <= IOPRIO_CLASS_DL;
+ return class > IOPRIO_CLASS_NONE && class <= IOPRIO_CLASS_IDLE;
}
/*
diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index 15908b9e9d8c..f70f2596a6bf 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -29,7 +29,6 @@ enum {
IOPRIO_CLASS_RT,
IOPRIO_CLASS_BE,
IOPRIO_CLASS_IDLE,
- IOPRIO_CLASS_DL,
};
/*
@@ -38,12 +37,6 @@ enum {
#define IOPRIO_NR_LEVELS 8
#define IOPRIO_BE_NR IOPRIO_NR_LEVELS
-/*
- * The Duration limits class allows 8 levels: level 0 for "no limit" and levels
- * 1 to 7, each corresponding to a read or write limit descriptor.
- */
-#define IOPRIO_DL_NR_LEVELS 8
-
enum {
IOPRIO_WHO_PROCESS = 1,
IOPRIO_WHO_PGRP,
next prev parent reply other threads:[~2023-01-25 18:37 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-24 19:02 [PATCH v3 00/18] Add Command Duration Limits support Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 01/18] block: introduce duration-limits priority class Niklas Cassel
2023-01-24 19:27 ` Bart Van Assche
2023-01-24 20:36 ` Bart Van Assche
2023-01-24 21:48 ` Damien Le Moal
2023-01-24 21:29 ` Damien Le Moal
2023-01-24 22:43 ` Bart Van Assche
2023-01-24 22:59 ` Damien Le Moal
2023-01-25 0:05 ` Bart Van Assche
2023-01-25 1:19 ` Damien Le Moal
2023-01-25 18:37 ` Bart Van Assche [this message]
2023-01-25 21:23 ` Niklas Cassel
2023-01-26 0:24 ` Damien Le Moal
2023-01-26 13:53 ` Niklas Cassel
2023-01-26 17:33 ` Bart Van Assche
2023-01-27 0:18 ` Damien Le Moal
2023-01-27 1:40 ` Damien Le Moal
2023-01-27 17:23 ` Bart Van Assche
2023-01-28 0:40 ` Damien Le Moal
2023-01-28 0:47 ` Bart Van Assche
2023-01-28 0:59 ` Damien Le Moal
2023-01-28 20:25 ` Martin K. Petersen
2023-01-29 3:52 ` Damien Le Moal
2023-01-30 13:44 ` Hannes Reinecke
2023-01-30 14:55 ` Damien Le Moal
2023-01-30 19:24 ` Bart Van Assche
2023-01-30 20:40 ` Bart Van Assche
2023-01-31 2:49 ` Martin K. Petersen
2023-01-31 3:10 ` Damien Le Moal
2023-01-30 19:13 ` Bart Van Assche
2023-01-31 2:58 ` Martin K. Petersen
2023-01-31 3:03 ` Damien Le Moal
2023-01-25 23:11 ` Keith Busch
2023-01-26 0:08 ` Damien Le Moal
2023-01-26 5:26 ` Christoph Hellwig
2023-01-25 6:33 ` Christoph Hellwig
2023-01-27 12:43 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 02/18] block: introduce BLK_STS_DURATION_LIMIT Niklas Cassel
2023-01-24 19:29 ` Bart Van Assche
2023-01-24 19:59 ` Keith Busch
2023-01-24 20:32 ` Bart Van Assche
2023-01-24 21:39 ` Damien Le Moal
2023-01-24 21:36 ` Damien Le Moal
2023-01-24 21:34 ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 03/18] scsi: core: allow libata to complete successful commands via EH Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 04/18] scsi: rename and move get_scsi_ml_byte() Niklas Cassel
2023-01-24 19:32 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 05/18] scsi: support retrieving sub-pages of mode pages Niklas Cassel
2023-01-24 19:34 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 06/18] scsi: support service action in scsi_report_opcode() Niklas Cassel
2023-01-24 19:36 ` Bart Van Assche
2023-01-24 19:02 ` [PATCH v3 07/18] scsi: sd: detect support for command duration limits Niklas Cassel
2023-01-24 19:39 ` Bart Van Assche
2023-01-27 13:00 ` Hannes Reinecke
2023-01-28 0:51 ` Damien Le Moal
2023-01-28 2:52 ` Bart Van Assche
2023-01-29 2:05 ` Damien Le Moal
2023-01-24 19:02 ` [PATCH v3 08/18] scsi: sd: set read/write commands CDL index Niklas Cassel
2023-01-27 15:30 ` Hannes Reinecke
2023-01-28 0:03 ` Damien Le Moal
2023-01-30 18:13 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 09/18] scsi: sd: handle read/write CDL timeout failures Niklas Cassel
2023-01-27 15:34 ` Hannes Reinecke
2023-01-28 0:06 ` Damien Le Moal
2023-02-03 16:49 ` Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 10/18] ata: libata-scsi: remove unnecessary !cmd checks Niklas Cassel
2023-01-27 15:35 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 11/18] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION Niklas Cassel
2023-01-27 15:36 ` Hannes Reinecke
2023-01-24 19:02 ` [PATCH v3 12/18] ata: libata: detect support for command duration limits Niklas Cassel
2023-01-24 19:02 ` [PATCH v3 13/18] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in() Niklas Cassel
2023-01-27 15:37 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 14/18] ata: libata-scsi: add support for CDL pages mode sense Niklas Cassel
2023-01-27 15:38 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 15/18] ata: libata: add ATA feature control sub-page translation Niklas Cassel
2023-01-27 15:40 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 16/18] ata: libata: set read/write commands CDL index Niklas Cassel
2023-01-27 15:41 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 17/18] ata: libata: handle completion of CDL commands using policy 0xD Niklas Cassel
2023-01-27 15:43 ` Hannes Reinecke
2023-01-24 19:03 ` [PATCH v3 18/18] Documentation: sysfs-block-device: document command duration limits Niklas Cassel
2023-01-27 15:43 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e8324901-7c18-153f-b47f-112a394832bd@acm.org \
--to=bvanassche@acm.org \
--cc=axboe@kernel.dk \
--cc=damien.lemoal@opensource.wdc.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=niklas.cassel@wdc.com \
--cc=paolo.valente@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox