Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Hannes Reinecke <hare@suse.de>, Keith Busch <kbusch@kernel.org>
Cc: Hannes Reinecke <hare@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Subject: Re: [PATCHv2 0/2] block,nvme: latency-based I/O scheduler
Date: Sun, 7 Apr 2024 22:55:43 +0300	[thread overview]
Message-ID: <cd732ee4-4a53-46bd-a9f8-92237a3afbaf@grimberg.me> (raw)
In-Reply-To: <46e26322-a677-4c27-bc22-e2c65ed9d03c@suse.de>



On 05/04/2024 18:36, Hannes Reinecke wrote:
> On 4/5/24 17:03, Keith Busch wrote:
>> On Fri, Apr 05, 2024 at 08:21:14AM +0200, Hannes Reinecke wrote:
>>> On 4/4/24 23:14, Keith Busch wrote:
>>>> On Wed, Apr 03, 2024 at 04:17:54PM +0200, Hannes Reinecke wrote:
>>>>> Hi all,
>>>>>
>>>>> there had been several attempts to implement a latency-based I/O
>>>>> scheduler for native nvme multipath, all of which had its issues.
>>>>>
>>>>> So time to start afresh, this time using the QoS framework
>>>>> already present in the block layer.
>>>>> It consists of two parts:
>>>>> - a new 'blk-nlatency' QoS module, which is just a simple per-node
>>>>>     latency tracker
>>>>> - a 'latency' nvme I/O policy
>>>> Whatever happened with the io-depth based path selector? That should
>>>> naturally align with the lower latency path, and that metric is 
>>>> cheaper
>>>> to track.
>>>
>>> Turns out that tracking queue depth (on the NVMe level) always requires
>>> an atomic, and with that a performance impact.
>>> The qos/blk-stat framework is already present, and as the numbers show
>>> actually leads to a performance improvement.
>>>
>>> So I'm not quite sure what the argument 'cheaper to track' buys us 
>>> here...
>>
>> I was considering the blk_stat framework compared to those atomic
>> operations. I usually don't enable stats because all the extra
>> ktime_get_ns() and indirect calls are relatively costly. If you're
>> enabling stats anyway though, then yeah, I guess I don't really have a
>> point and your idea here seems pretty reasonable.
>
> Pretty much. Of course you need stats to be enabled.
> And problem with the queue depth is that it's actually quite costly
> to compute; the while sbitmap thingie is precisely there to _avoid_
> having to track the queue depth.
> I can't really see how one could track the queue depth efficiently;
> the beauty of the blk_stat framework is that it's running async, and
> only calculated after I/O is completed.
> We could do a 'mock' queue depth by calculating the difference between
> submitted and completed I/O, but even then you'd have to inject a call
> in the hot path to track the number of submissions.
>
> In the end, the latency tracker did what I wanted to achieve (namely
> balance out uneven paths), _and_ got faster than round-robin, so I 
> didn't care about queue depth tracking.

Hey Hannes,

I think its a fair claim that a latency tracker is a valid proxy for 
io-depth tracker.

I think that we need Ewan to validate if this solves the original issue 
he was trying
to solve with his io-depth mpath selector. If so, I don't see any major 
issues with this
proposal.


  reply	other threads:[~2024-04-07 19:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-03 14:17 [PATCHv2 0/2] block,nvme: latency-based I/O scheduler Hannes Reinecke
2024-04-03 14:17 ` [PATCH 1/2] block: track per-node I/O latency Hannes Reinecke
2024-04-04  2:22   ` kernel test robot
2024-04-04  2:55   ` kernel test robot
2024-04-04 18:47   ` kernel test robot
2024-04-03 14:17 ` [PATCH 2/2] nvme: add 'latency' iopolicy Hannes Reinecke
2024-04-04 21:14 ` [PATCHv2 0/2] block,nvme: latency-based I/O scheduler Keith Busch
2024-04-05  6:21   ` Hannes Reinecke
2024-04-05 15:03     ` Keith Busch
2024-04-05 15:36       ` Hannes Reinecke
2024-04-07 19:55         ` Sagi Grimberg [this message]
2024-05-09 20:43 ` [PATCH v3 0/3] " John Meneghini
2024-05-10  9:34   ` Niklas Cassel
2024-05-09 20:43 ` [PATCH v3 1/3] block: track per-node I/O latency John Meneghini
2024-05-10  7:11   ` Damien Le Moal
2024-05-10  9:28     ` Niklas Cassel
2024-05-10 10:00     ` Hannes Reinecke
2024-05-09 20:43 ` [PATCH v3 2/3] nvme: add 'latency' iopolicy John Meneghini
2024-05-10  7:17   ` Damien Le Moal
2024-05-10 10:03     ` Hannes Reinecke
2024-05-09 20:43 ` [PATCH v3 3/3] nvme: multipath: pr_notice when iopolicy changes John Meneghini
2024-05-10  7:19   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd732ee4-4a53-46bd-a9f8-92237a3afbaf@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=axboe@kernel.dk \
    --cc=hare@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox