Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Keith Busch <kbusch@kernel.org>
Cc: Hannes Reinecke <hare@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Sagi Grimberg <sagi@grimberg.me>, Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Subject: Re: [PATCHv2 0/2] block,nvme: latency-based I/O scheduler
Date: Fri, 5 Apr 2024 17:36:40 +0200	[thread overview]
Message-ID: <46e26322-a677-4c27-bc22-e2c65ed9d03c@suse.de> (raw)
In-Reply-To: <ZhAS0YKfV07qlUes@kbusch-mbp.dhcp.thefacebook.com>

On 4/5/24 17:03, Keith Busch wrote:
> On Fri, Apr 05, 2024 at 08:21:14AM +0200, Hannes Reinecke wrote:
>> On 4/4/24 23:14, Keith Busch wrote:
>>> On Wed, Apr 03, 2024 at 04:17:54PM +0200, Hannes Reinecke wrote:
>>>> Hi all,
>>>>
>>>> there had been several attempts to implement a latency-based I/O
>>>> scheduler for native nvme multipath, all of which had its issues.
>>>>
>>>> So time to start afresh, this time using the QoS framework
>>>> already present in the block layer.
>>>> It consists of two parts:
>>>> - a new 'blk-nlatency' QoS module, which is just a simple per-node
>>>>     latency tracker
>>>> - a 'latency' nvme I/O policy
>>> Whatever happened with the io-depth based path selector? That should
>>> naturally align with the lower latency path, and that metric is cheaper
>>> to track.
>>
>> Turns out that tracking queue depth (on the NVMe level) always requires
>> an atomic, and with that a performance impact.
>> The qos/blk-stat framework is already present, and as the numbers show
>> actually leads to a performance improvement.
>>
>> So I'm not quite sure what the argument 'cheaper to track' buys us here...
> 
> I was considering the blk_stat framework compared to those atomic
> operations. I usually don't enable stats because all the extra
> ktime_get_ns() and indirect calls are relatively costly. If you're
> enabling stats anyway though, then yeah, I guess I don't really have a
> point and your idea here seems pretty reasonable.

Pretty much. Of course you need stats to be enabled.
And problem with the queue depth is that it's actually quite costly
to compute; the while sbitmap thingie is precisely there to _avoid_
having to track the queue depth.
I can't really see how one could track the queue depth efficiently;
the beauty of the blk_stat framework is that it's running async, and
only calculated after I/O is completed.
We could do a 'mock' queue depth by calculating the difference between
submitted and completed I/O, but even then you'd have to inject a call
in the hot path to track the number of submissions.

In the end, the latency tracker did what I wanted to achieve (namely
balance out uneven paths), _and_ got faster than round-robin, so I 
didn't care about queue depth tracking.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich



  reply	other threads:[~2024-04-05 15:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-03 14:17 [PATCHv2 0/2] block,nvme: latency-based I/O scheduler Hannes Reinecke
2024-04-03 14:17 ` [PATCH 1/2] block: track per-node I/O latency Hannes Reinecke
2024-04-04  2:22   ` kernel test robot
2024-04-04  2:55   ` kernel test robot
2024-04-04 18:47   ` kernel test robot
2024-04-03 14:17 ` [PATCH 2/2] nvme: add 'latency' iopolicy Hannes Reinecke
2024-04-04 21:14 ` [PATCHv2 0/2] block,nvme: latency-based I/O scheduler Keith Busch
2024-04-05  6:21   ` Hannes Reinecke
2024-04-05 15:03     ` Keith Busch
2024-04-05 15:36       ` Hannes Reinecke [this message]
2024-04-07 19:55         ` Sagi Grimberg
2024-05-09 20:43 ` [PATCH v3 0/3] " John Meneghini
2024-05-10  9:34   ` Niklas Cassel
2024-05-09 20:43 ` [PATCH v3 1/3] block: track per-node I/O latency John Meneghini
2024-05-10  7:11   ` Damien Le Moal
2024-05-10  9:28     ` Niklas Cassel
2024-05-10 10:00     ` Hannes Reinecke
2024-05-09 20:43 ` [PATCH v3 2/3] nvme: add 'latency' iopolicy John Meneghini
2024-05-10  7:17   ` Damien Le Moal
2024-05-10 10:03     ` Hannes Reinecke
2024-05-09 20:43 ` [PATCH v3 3/3] nvme: multipath: pr_notice when iopolicy changes John Meneghini
2024-05-10  7:19   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46e26322-a677-4c27-bc22-e2c65ed9d03c@suse.de \
    --to=hare@suse.de \
    --cc=axboe@kernel.dk \
    --cc=hare@kernel.org \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox