From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44B69C54EAA for ; Mon, 30 Jan 2023 14:55:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237898AbjA3OzT (ORCPT ); Mon, 30 Jan 2023 09:55:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235737AbjA3OzR (ORCPT ); Mon, 30 Jan 2023 09:55:17 -0500 Received: from esa2.hgst.iphmx.com (esa2.hgst.iphmx.com [68.232.143.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69B452E0CD for ; Mon, 30 Jan 2023 06:55:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1675090511; x=1706626511; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=BQHcGmdM4nysgwNS5/yf0zA2YFS7llZYqjrDKqY2Ljc=; b=N9cJlEFaRwL19mJSTmouyffljWKhDd6vwqn3kPiBSnKAaSR3zYf4fFh2 eAyXQkQvZJDu2g2G9mGAjyn07feN+9lTBVS1FXvMHtQLiWkVdFJA6NFLI kVx7Q3eJhs2IuZhO2PsabEoSgywWebduxPMxwxcNE012umgyGS0GZrQZz yLvTOkGumMRza+J8R7bLr+S4IqJIaUov45koJToLhRLVE36T0wSHIZpeQ PRi5hzfhQEbDTiVnlL84/tUVAa8BSdSV3+dhXX8QBFsT7zwVK7gDoUIcQ oKpVSdduHqxWqrs8qqKd0XkKfH/sSzEAAqdtipK6LwkykeL1borK5Knr7 Q==; X-IronPort-AV: E=Sophos;i="5.97,258,1669046400"; d="scan'208";a="326379300" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 30 Jan 2023 22:55:10 +0800 IronPort-SDR: hL0kBKF9CTSEwSrWvhn2z7hZDw9NSGqV3Sq/FiHm4km9GZNV1ijOM9qnENXzqOpIiIlxr/Zgm3 P3rnfXUN9IQJK0Y2pripNU7i9mNzBuxv89lyoKSOHaZ2ZRbkwqfkKPGIpBFOHC518ESwqjgAN9 Lcmhm9IazML2UuQWACsGhQewRzhq5PJKKBwDp5RiszDIMZ2u9DmdNQbeAYklgFQPfk1EeE6PIv B1ggYnQJLl633U/U9Se+JCFf8mUYneCcN+sIyruA9pwdvXHmeYRHDrR7P9CK+X3UY/gU+vlfkz fjA= Received: from uls-op-cesaip02.wdc.com ([10.248.3.37]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 30 Jan 2023 06:06:51 -0800 IronPort-SDR: JVXQoEmd5xPya8Vt2wp3NmnG3T3lyCtluER8oyG1uaNoKgEYGxQ4xIQYnd9jKRZNZeeLMqF4z8 yFtRHrFA4t8gc+JmHUbdLfltIXyL+VMiiYHEbW/Zy5FENNwn+TqNPmOofYY9ZJ8ceVuEYL5cld lAOO1HHfIPlG8VnjF95G190pXuEloQZDqwFqaLImFbAVzY8jlgipAZjbDy2bi6kzXxVGvpHBOg 1of1OfOGUvW26M7K5s+VdB6x8skIWQSErhqat8S4OyebksGPqUeu/i8cCmqxN3El8OX80QV4Xw Q4w= WDCIronportException: Internal Received: from usg-ed-osssrv.wdc.com ([10.3.10.180]) by uls-op-cesaip02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 30 Jan 2023 06:55:10 -0800 Received: from usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTP id 4P5B722T4Lz1Rwrq for ; Mon, 30 Jan 2023 06:55:10 -0800 (PST) Authentication-Results: usg-ed-osssrv.wdc.com (amavisd-new); dkim=pass reason="pass (just generated, assumed good)" header.d=opensource.wdc.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d= opensource.wdc.com; h=content-transfer-encoding:content-type :in-reply-to:organization:from:references:to:content-language :subject:user-agent:mime-version:date:message-id; s=dkim; t= 1675090509; x=1677682510; bh=BQHcGmdM4nysgwNS5/yf0zA2YFS7llZYqjr DKqY2Ljc=; b=tFXvLMSfWiO+zVha7KqPjeqtYZl1OJ7gJwuwEkuuprY8NL/iFeh GzvlEaUKK57LU4qfzIKslTJbLAfJGJyajj6WOq19Fw+jVIisqHbA2G4HNdYwXIvr +9qDvP1sDnOBvPHGk8pAqBvd8+AQNaDSjWrPf+G7pkND+BcnXRwdxx6rGbOrRr82 GpzU0nQDffMy2s/K4VWgvrasD5gU0uvjFghBchBbm74IDert9btDxPiL8tR0gXMs j+zXMdh4XTcOPjF3kBIKvn/f7Avrx6TVFAcfxCJPdowpofupZfxuXPLfdlltnQQg /FeViPuNTqk74mEEilzENCTZLZN9MZ4nykA== X-Virus-Scanned: amavisd-new at usg-ed-osssrv.wdc.com Received: from usg-ed-osssrv.wdc.com ([127.0.0.1]) by usg-ed-osssrv.wdc.com (usg-ed-osssrv.wdc.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id fXJ8FUMKYH_Z for ; Mon, 30 Jan 2023 06:55:09 -0800 (PST) Received: from [10.225.163.66] (unknown [10.225.163.66]) by usg-ed-osssrv.wdc.com (Postfix) with ESMTPSA id 4P5B6z3xLmz1RvLy; Mon, 30 Jan 2023 06:55:07 -0800 (PST) Message-ID: <0a2b9ba7-cc3f-400b-6a8a-c38ba269af75@opensource.wdc.com> Date: Mon, 30 Jan 2023 23:55:06 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH v3 01/18] block: introduce duration-limits priority class Content-Language: en-US To: Hannes Reinecke , "Martin K. Petersen" Cc: Bart Van Assche , Niklas Cassel , Paolo Valente , Jens Axboe , Christoph Hellwig , "linux-scsi@vger.kernel.org" , "linux-ide@vger.kernel.org" , "linux-block@vger.kernel.org" References: <20230124190308.127318-2-niklas.cassel@wdc.com> <873e0213-94b5-0d81-a8aa-4671241e198c@acm.org> <4c345d8b-7efa-85c9-fe1c-1124ea5d9de6@opensource.wdc.com> <5066441f-e265-ed64-fa39-f77a931ab998@acm.org> <275993f1-f9e8-e7a8-e901-2f7d3a6bb501@opensource.wdc.com> <86de1e78-0ff2-be70-f592-673bce76e5ac@opensource.wdc.com> <7f0a2464-673a-f64a-4ebb-e599c3123a24@acm.org> <29b50dbd-76e9-cdce-4227-a22223850c9a@opensource.wdc.com> <049a7e88-89d1-804f-a0b5-9e5d93d505f7@opensource.wdc.com> <4e803108-9526-6a75-f209-789a06ef52f9@opensource.wdc.com> From: Damien Le Moal Organization: Western Digital Research In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 1/30/23 22:44, Hannes Reinecke wrote: > On 1/29/23 04:52, Damien Le Moal wrote: >> On 1/29/23 05:25, Martin K. Petersen wrote: > [ .. ] >>> >>> As such, I don't like the "just customize your settings with >>> cdltools" approach. I'd much rather see us try to define a few QoS >>> classes that make sense that would apply to every app and use those >>> to define the application interface. And then have the kernel program >>> those CDL classes into SCSI/ATA devices by default. >> >> Makes sense. Though I think it will be hard to define a set of QoS hints that >> are useful for a wide range of applications, and even harder to convert the >> defined hint classes to CDL descriptors. I fear that we may end up with the same >> issues as IO hints/streams. >> >>> Having the kernel provide an abstract interface for bio QoS and >>> configuring a new disk with a sane handful of classes does not >>> prevent $CLOUD_VENDOR from overriding what Linux configured. But at >>> least we'd have a generic approach to block QoS in Linux. Similar to >>> the existing I/O priority infrastructure which is also not tied to >>> any particular hardware feature. >> >> OK. See below about this. >> >>> A generic implementation also allows us to do fancy things in the >>> hypervisor where we would like to be able to do QoS across multiple >>> devices as well. Without having ATA or SCSI with CDL involved. Or >>> whatever things might look like in NVMe. >> >> Fair point, especially given that virtio actually already forwards a guest >> ioprio to the host through the virtio block command. Thinking of that particular >> point together with what you said, I came up with the change show below as a >> replacement for this patch 1/18. >> >> This changes the 13-bits ioprio data into a 3-bits QOS hint + 3-bits of IO prio >> level. This is consistent with the IO prio interface since IO priority levels >> have to be between 0 and 7 (otherwise, errors are returned). So in fact, the >> upper 10-bits of the ioprio data are ignored and we can safely use 3 of these >> bits for an IO hint. >> >> This hint applies to all priority classes and levels, that is, for the CDL case, >> we can enrich any priority with a hint that specifies the CDL index to use for >> an IO. >> >> This falls short of actually defining generic IO hints, but this has the >> advantage to not break anything for current applications using IO priorities, >> not require any change to existing IO schedulers, while still allowing to pass >> CDL indexes for IOs down to the scsi & ATA layers (which for now would be the >> only layers in the kernel acting on the ioprio qos hints). >> >> I think that this approach still allows us to enable CDL support, and on top of >> it, go further and define generic QOS hints that IO scheduler can use and that >> also potentially map to CDL for scsi & ata (similarly to the RT class IOs >> mapping to the NCQ priority feature if the user enabled that feature). >> >> As mentioned above, I think that defining generic IO hint classes will be >> difficult. But the change below is I think a good a starting point that should >> not prevent working on that. >> >> Thoughts ? >> > I like the idea. > QoS is one of the recurring topic always coming up sooner or later when > talking of storage networks, so having _some_ concept of QoS in the > linux kernel (for storage) would be beneficial. > > Maybe time for a topic at LSF? Yes. I was hoping for a quicker resolution so that we can get the CDL "mechanical" bits in, but without a nice API for it, we cannot :) Trying to compile something with Niklas. So far, we are thinking of having QOS flags + QOS data, the flags determining how (and if) the QOS data is used and what it means. Ex of things We could have: * IOPRIO_QOS_FAILFAST: do not retry the IO if it fails the first time * IOPRIO_QOS_DURATION_LIMIT: then the QOS data indicates the limit to use (number). That can be implemented in schedulers and also map to CDL on drives that support that feature. That is the difficult part: what else ? For now, considering only our target of adding scsi & ata CDL support, the above is enough. But is that enough in general for most users/apps ? > > Cheers, > > Hannes > -- Damien Le Moal Western Digital Research