From: Hannes Reinecke <hare@suse.de>
To: Nilay Shroff <nilay@linux.ibm.com>, linux-nvme@lists.infradead.org
Cc: hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, dwagner@suse.de,
axboe@kernel.dk, gjoyce@ibm.com
Subject: Re: [RFC PATCHv4 2/6] nvme-multipath: add support for adaptive I/O policy
Date: Tue, 4 Nov 2025 15:57:24 +0100 [thread overview]
Message-ID: <b07e9fa2-1f7a-47b5-bbd0-9d8d3d35d8ba@suse.de> (raw)
In-Reply-To: <20251104104533.138481-3-nilay@linux.ibm.com>
On 11/4/25 11:45, Nilay Shroff wrote:
> This commit introduces a new I/O policy named "adaptive". Users can
> configure it by writing "adaptive" to "/sys/class/nvme-subsystem/nvme-
> subsystemX/iopolicy"
>
> The adaptive policy dynamically distributes I/O based on measured
> completion latency. The main idea is to calculate latency for each path,
> derive a weight, and then proportionally forward I/O according to those
> weights.
>
> To ensure scalability, path latency is measured per-CPU. Each CPU
> maintains its own statistics, and I/O forwarding uses these per-CPU
> values. Every ~15 seconds, a simple average latency of per-CPU batched
> samples are computed and fed into an Exponentially Weighted Moving
> Average (EWMA):
>
> avg_latency = div_u64(batch, batch_count);
> new_ewma_latency = (prev_ewma_latency * (WEIGHT-1) + avg_latency)/WEIGHT
>
> With WEIGHT = 8, this assigns 7/8 (~87.5%) weight to the previous
> latency value and 1/8 (~12.5%) to the most recent latency. This
> smoothing reduces jitter, adapts quickly to changing conditions,
> avoids storing historical samples, and works well for both low and
> high I/O rates. Path weights are then derived from the smoothed (EWMA)
> latency as follows (example with two paths A and B):
>
> path_A_score = NSEC_PER_SEC / path_A_ewma_latency
> path_B_score = NSEC_PER_SEC / path_B_ewma_latency
> total_score = path_A_score + path_B_score
>
> path_A_weight = (path_A_score * 100) / total_score
> path_B_weight = (path_B_score * 100) / total_score
>
> where:
> - path_X_ewma_latency is the smoothed latency of a path in nanoseconds
> - NSEC_PER_SEC is used as a scaling factor since valid latencies
> are < 1 second
> - weights are normalized to a 0–64 scale across all paths.
>
> Path credits are refilled based on this weight, with one credit
> consumed per I/O. When all credits are consumed, the credits are
> refilled again based on the current weight. This ensures that I/O is
> distributed across paths proportionally to their calculated weight.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
> drivers/nvme/host/core.c | 15 +-
> drivers/nvme/host/ioctl.c | 31 ++-
> drivers/nvme/host/multipath.c | 425 ++++++++++++++++++++++++++++++++--
> drivers/nvme/host/nvme.h | 74 +++++-
> drivers/nvme/host/pr.c | 6 +-
> drivers/nvme/host/sysfs.c | 2 +-
> 6 files changed, 530 insertions(+), 23 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
next prev parent reply other threads:[~2025-11-04 14:57 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 10:45 [RFC PATCHv4 0/6] nvme-multipath: introduce adaptive I/O policy Nilay Shroff
2025-11-04 10:45 ` [RFC PATCHv4 1/6] block: expose blk_stat_{enable,disable}_accounting() to drivers Nilay Shroff
2025-11-04 10:45 ` [RFC PATCHv4 2/6] nvme-multipath: add support for adaptive I/O policy Nilay Shroff
2025-11-04 14:57 ` Hannes Reinecke [this message]
2025-11-04 10:45 ` [RFC PATCHv4 3/6] nvme: add generic debugfs support Nilay Shroff
2025-11-04 10:45 ` [RFC PATCHv4 4/6] nvme-multipath: add debugfs attribute adaptive_ewma_shift Nilay Shroff
2025-11-04 14:58 ` Hannes Reinecke
2025-11-04 10:45 ` [RFC PATCHv4 5/6] nvme-multipath: add debugfs attribute adaptive_weight_timeout Nilay Shroff
2025-11-04 14:58 ` Hannes Reinecke
2025-11-04 10:45 ` [RFC PATCHv4 6/6] nvme-multipath: add debugfs attribute adaptive_stat Nilay Shroff
2025-11-04 16:57 ` [RFC PATCHv4 0/6] nvme-multipath: introduce adaptive I/O policy Guixin Liu
2025-11-05 6:57 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b07e9fa2-1f7a-47b5-bbd0-9d8d3d35d8ba@suse.de \
--to=hare@suse.de \
--cc=axboe@kernel.dk \
--cc=dwagner@suse.de \
--cc=gjoyce@ibm.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=nilay@linux.ibm.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox