public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Nilay Shroff <nilay@linux.ibm.com>, linux-nvme@lists.infradead.org
Cc: hch@lst.de, kbusch@kernel.org, sagi@grimberg.me, dwagner@suse.de,
	axboe@kernel.dk, gjoyce@ibm.com
Subject: Re: [RFC PATCHv4 2/6] nvme-multipath: add support for adaptive I/O policy
Date: Tue, 4 Nov 2025 15:57:24 +0100	[thread overview]
Message-ID: <b07e9fa2-1f7a-47b5-bbd0-9d8d3d35d8ba@suse.de> (raw)
In-Reply-To: <20251104104533.138481-3-nilay@linux.ibm.com>

On 11/4/25 11:45, Nilay Shroff wrote:
> This commit introduces a new I/O policy named "adaptive". Users can
> configure it by writing "adaptive" to "/sys/class/nvme-subsystem/nvme-
> subsystemX/iopolicy"
> 
> The adaptive policy dynamically distributes I/O based on measured
> completion latency. The main idea is to calculate latency for each path,
> derive a weight, and then proportionally forward I/O according to those
> weights.
> 
> To ensure scalability, path latency is measured per-CPU. Each CPU
> maintains its own statistics, and I/O forwarding uses these per-CPU
> values. Every ~15 seconds, a simple average latency of per-CPU batched
> samples are computed and fed into an Exponentially Weighted Moving
> Average (EWMA):
> 
> avg_latency = div_u64(batch, batch_count);
> new_ewma_latency = (prev_ewma_latency * (WEIGHT-1) + avg_latency)/WEIGHT
> 
> With WEIGHT = 8, this assigns 7/8 (~87.5%) weight to the previous
> latency value and 1/8 (~12.5%) to the most recent latency. This
> smoothing reduces jitter, adapts quickly to changing conditions,
> avoids storing historical samples, and works well for both low and
> high I/O rates. Path weights are then derived from the smoothed (EWMA)
> latency as follows (example with two paths A and B):
> 
>      path_A_score = NSEC_PER_SEC / path_A_ewma_latency
>      path_B_score = NSEC_PER_SEC / path_B_ewma_latency
>      total_score  = path_A_score + path_B_score
> 
>      path_A_weight = (path_A_score * 100) / total_score
>      path_B_weight = (path_B_score * 100) / total_score
> 
> where:
>    - path_X_ewma_latency is the smoothed latency of a path in nanoseconds
>    - NSEC_PER_SEC is used as a scaling factor since valid latencies
>      are < 1 second
>    - weights are normalized to a 0–64 scale across all paths.
> 
> Path credits are refilled based on this weight, with one credit
> consumed per I/O. When all credits are consumed, the credits are
> refilled again based on the current weight. This ensures that I/O is
> distributed across paths proportionally to their calculated weight.
> 
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
>   drivers/nvme/host/core.c      |  15 +-
>   drivers/nvme/host/ioctl.c     |  31 ++-
>   drivers/nvme/host/multipath.c | 425 ++++++++++++++++++++++++++++++++--
>   drivers/nvme/host/nvme.h      |  74 +++++-
>   drivers/nvme/host/pr.c        |   6 +-
>   drivers/nvme/host/sysfs.c     |   2 +-
>   6 files changed, 530 insertions(+), 23 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


  reply	other threads:[~2025-11-04 14:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 10:45 [RFC PATCHv4 0/6] nvme-multipath: introduce adaptive I/O policy Nilay Shroff
2025-11-04 10:45 ` [RFC PATCHv4 1/6] block: expose blk_stat_{enable,disable}_accounting() to drivers Nilay Shroff
2025-11-04 10:45 ` [RFC PATCHv4 2/6] nvme-multipath: add support for adaptive I/O policy Nilay Shroff
2025-11-04 14:57   ` Hannes Reinecke [this message]
2025-11-04 10:45 ` [RFC PATCHv4 3/6] nvme: add generic debugfs support Nilay Shroff
2025-11-04 10:45 ` [RFC PATCHv4 4/6] nvme-multipath: add debugfs attribute adaptive_ewma_shift Nilay Shroff
2025-11-04 14:58   ` Hannes Reinecke
2025-11-04 10:45 ` [RFC PATCHv4 5/6] nvme-multipath: add debugfs attribute adaptive_weight_timeout Nilay Shroff
2025-11-04 14:58   ` Hannes Reinecke
2025-11-04 10:45 ` [RFC PATCHv4 6/6] nvme-multipath: add debugfs attribute adaptive_stat Nilay Shroff
2025-11-04 16:57 ` [RFC PATCHv4 0/6] nvme-multipath: introduce adaptive I/O policy Guixin Liu
2025-11-05  6:57   ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b07e9fa2-1f7a-47b5-bbd0-9d8d3d35d8ba@suse.de \
    --to=hare@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dwagner@suse.de \
    --cc=gjoyce@ibm.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nilay@linux.ibm.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox