From: Bart Van Assche <bvanassche@acm.org>
To: Damien Le Moal <Damien.LeMoal@wdc.com>, Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH block-5.14] Revert "block/mq-deadline: Add cgroup support"
Date: Fri, 13 Aug 2021 10:15:08 -0700 [thread overview]
Message-ID: <9699c8e8-ef8f-ef37-1f99-0c446ff9d9a0@acm.org> (raw)
In-Reply-To: <DM6PR04MB7081F2D0E8579489175DF363E7FA9@DM6PR04MB7081.namprd04.prod.outlook.com>
On 8/12/21 7:18 PM, Damien Le Moal wrote:
> Let me throw in more information related to this.
>
> Command duration limits (CDL) and Sequestered commands features are being
> drafted in SPC/SBC and ACS to give the device better hints than just a on/off
> high priority bit. I am currently prototyping these features and I am reusing
> the ioprio interface for that. Here is how this works:
> 1) The drives exposes a set of command duration limits descriptors (up to 7 for
> reads and 7 for writes) that define duration limits for a command execution:
> overall processing time, queuing time and execution time. Each duration time has
> a policy associated with it that is applied if a command processing exceeds one
> of the defined time limit: continue, continue but signal limit exceeded, abort.
> 2) Users can change the drive command duration limits to whatever they need
> (e.g. change the policies for the limits to get a fast-fail behavior for
> commands instead of having the drive retry for a long time)
> 3) When issuing IOs, users (or FSes) can apply a command duration limit
> descriptor by specifying the IOPRIO_CLASS_DL priority class. The priority level
> for that class indicates the descriptor to apply to the command.
> 4) At SCSI/ATA level, read and write commands have 3 bits defined to specify the
> command descriptor to apply to the command (1 to 7 or 0 for no limit)
>
> With that in place, the disk firmware can now make more intelligent decisions on
> command scheduling to keep performance high at high queue depth without
> increasing latency for commands that have low duration limits. And based on the
> policy defined for a limit, this can be a "soft" best-effort optimization by the
> disk, or a hard one with aborts if the drive decides that what the user is
> asking for is not possible.
>
> CDL can completely replace the existing binary on/off NCQ priority in a more
> flexible manner as the user can set different duration limits for high vs low
> priority. E.g. high priority is equivalent to a very short limit while low
> priority is equivalent to longer or no limits.
>
> I think that CDL has the potential for better interactions with cgroups as
> cgroup controllers can install a set of limits on the drive that fits the
> controller target policy. E.g., the latency controller can set duration limits
> and use the IOPRIO_CLASS_DL class to tell the drive the exact latency target to use.
>
> In my implementation, I have not yet looked into cgroups integration for CDL
> though. I am still wondering what the best approach is: defining a new
> controller or integrating into existing controllers. The former is likely easier
> than the latter, but having hardware support for existing controllers has the
> potential to improve them seamlessly without forcing the user to change anything
> to there application setup.
>
> CDL is still in draft state in the specs though. So I will not be sending this yet.
Thanks Damien for having provided this additional information. This is
very helpful. I see this as a welcome evolution since the disk firmware
has more information than the CPU (e.g. about the disk head position)
and hence can make a better decision than an I/O scheduler or cgroup policy.
For the cloud use case, are all disks used to implement disaggregated
storage? I'm asking this because in a disaggregated storage setup the
I/O submitter runs on another server than the server to which the disks
are connected. In such a setup I expect that the I/O priority will be
provided from user space instead of being provided by a cgroup.
Bart.
next prev parent reply other threads:[~2021-08-13 17:15 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-11 17:41 [PATCH block-5.14] Revert "block/mq-deadline: Add cgroup support" Tejun Heo
2021-08-11 18:49 ` Bart Van Assche
2021-08-11 19:14 ` Tejun Heo
2021-08-11 20:22 ` Bart Van Assche
2021-08-12 17:51 ` Tejun Heo
2021-08-12 18:16 ` Bart Van Assche
2021-08-12 19:23 ` Tejun Heo
2021-08-13 2:18 ` Damien Le Moal
2021-08-13 16:29 ` Tejun Heo
2021-08-13 17:17 ` Bart Van Assche
2021-08-13 21:43 ` Tejun Heo
2021-08-13 17:15 ` Bart Van Assche [this message]
2021-08-12 18:56 ` Jens Axboe
2021-08-12 19:10 ` Tejun Heo
2021-08-11 19:48 ` Jens Axboe
2021-08-12 14:14 ` Oleksandr Natalenko
2021-08-12 15:50 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9699c8e8-ef8f-ef37-1f99-0c446ff9d9a0@acm.org \
--to=bvanassche@acm.org \
--cc=Damien.LeMoal@wdc.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox