linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-nvme@lists.infradead.org
Cc: kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, axboe@kernel.dk,
	hare@suse.de, dwagner@suse.de, gjoyce@ibm.com
Subject: [RFC PATCH 5/5] nvme-multipath: factor fabric link speed into path score
Date: Sun, 21 Sep 2025 16:42:25 +0530	[thread overview]
Message-ID: <20250921111234.863853-6-nilay@linux.ibm.com> (raw)
In-Reply-To: <20250921111234.863853-1-nilay@linux.ibm.com>

If the fabric adapter link speed is known, include it when calculating
the path score for the adaptive I/O policy. Paths with higher link
speed receive proportionally higher scores, while paths with lower link
speed receive lower scores.

For example, in a multipath topology with two paths—one with higher
link speed but higher latency, and another with lower link speed but
lower latency—the scoring formula balances these factors. The result
ensures that path selection does not blindly favor high link speed, but
adjusts scores based on both link speed and latency to achieve
proportional distribution.

The updated path scoring formula is:

    path_X_score = link_speed_X * (NSEC_PER_SEC / path_X_ewma_latency)

where:
  - link_speed_X is the negotiated link speed of the fabric adapter
    (in Mbps),
  - path_X_ewma_latency is the smoothed latency (ns) derived from I/O
    completions,
  - NSEC_PER_SEC is used as a scaling factor.

Weights are then normalized across all paths:

    path_X_weight = (path_X_score * 100) / total_score

This ensures that both lower latency and higher link speed contribute
positively to path selection, while still distributing I/O
proportionally when conditions differ across paths.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 drivers/nvme/host/multipath.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index bcceb0fceb94..6ab42350284d 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -246,7 +246,7 @@ static void nvme_mpath_add_sample(struct request *rq, struct nvme_ns *ns)
 	unsigned int rw;
 	struct nvme_path_stat *stat;
 	struct nvme_ns *cur_ns;
-	u32 weight;
+	u32 weight, speed;
 	u64 now, latency, avg_lat_ns;
 	u64 total_score = 0;
 	struct nvme_ns_head *head = ns->head;
@@ -347,14 +347,18 @@ static void nvme_mpath_add_sample(struct request *rq, struct nvme_ns *ns)
 				continue;
 
 			/*
-			 * Compute the path score (inverse of smoothed latency),
-			 * scaled by NSEC_PER_SEC. Floating point math is not
-			 * available in the kernel, so fixed-point scaling is
-			 * used instead. NSEC_PER_SEC is chosen as the scale
-			 * because valid latencies are always < 1 second; and
-			 * we ignore longer latencies.
+			 * Compute the path score as the inverse of smoothed
+			 * latency, scaled by NSEC_PER_SEC. If the device speed
+			 * is known, it is factored in: higher speed increases
+			 * the score, lower speed decreases it. Floating point
+			 * math is unavailable in the kernel, so fixed-point
+			 * scaling is used instead. NSEC_PER_SEC is chosen
+			 * because valid latencies are always < 1 second; longer
+			 * latencies are ignored.
 			 */
-			stat->score = div_u64(NSEC_PER_SEC, stat->slat_ns);
+			speed = cur_ns->speed ? : 1;
+			stat->score = speed * div_u64(NSEC_PER_SEC,
+					stat->slat_ns);
 
 			/* Compute total score. */
 			total_score += stat->score;
-- 
2.51.0



      parent reply	other threads:[~2025-09-21 11:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-21 11:12 [RFC PATCH 0/5] nvme-multipath: introduce adaptive I/O policy Nilay Shroff
2025-09-21 11:12 ` [RFC PATCH 1/5] block: expose blk_stat_{enable,disable}_accounting() to drivers Nilay Shroff
2025-09-21 11:12 ` [RFC PATCH 2/5] nvme-multipath: add support for adaptive I/O policy Nilay Shroff
2025-09-22  7:30   ` Hannes Reinecke
2025-09-23  3:43     ` Nilay Shroff
2025-09-23  7:03       ` Hannes Reinecke
2025-09-23 10:56         ` Nilay Shroff
2025-09-21 11:12 ` [RFC PATCH 3/5] nvme-multipath: add sysfs attribute " Nilay Shroff
2025-09-22  7:35   ` Hannes Reinecke
2025-09-23  3:53     ` Nilay Shroff
2025-09-21 11:12 ` [RFC PATCH 4/5] nvmf-tcp: add support for retrieving adapter link speed Nilay Shroff
2025-09-22  7:38   ` Hannes Reinecke
2025-09-23  9:33     ` Nilay Shroff
2025-09-23 10:27       ` Hannes Reinecke
2025-09-23 17:58         ` Nilay Shroff
2025-09-21 11:12 ` Nilay Shroff [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250921111234.863853-6-nilay@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=dwagner@suse.de \
    --cc=gjoyce@ibm.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).