From: hare@suse.de (Hannes Reinecke)
Subject: [PATCH 3/3] nvme-multipath: automatic NUMA path balancing
Date: Fri, 2 Nov 2018 10:56:41 +0100 [thread overview]
Message-ID: <20181102095641.28504-4-hare@suse.de> (raw)
In-Reply-To: <20181102095641.28504-1-hare@suse.de>
In order to utilize both paths on dual-ported HBAs we cannot rely
on the NUMA affinity alone, but rather have to distribute the
locality information to get the best possible result.
This patch implements a two-pass algorithm for assinging NUMA
locality information:
1. Distribute existing locality information so that no core has
more than one 'local' controller
2. Assign a 'local' controller for each of the remaining cores,
so that the overall weight (ie the sum of all locality information)
per ctrl is minimal.
Signed-off-by: Hannes Reinecke <hare at suse.com>
---
drivers/nvme/host/multipath.c | 89 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 88 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6d1412af7332..4944ffdf6831 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -259,15 +259,60 @@ static void nvme_requeue_work(struct work_struct *work)
}
}
+void nvme_mpath_distribute_paths(struct nvme_subsystem *subsys, int num_ctrls,
+ struct nvme_ctrl *ctrl, int numa_node)
+{
+ int node;
+ int found_node = NUMA_NO_NODE;
+ int max = LOCAL_DISTANCE * num_ctrls;
+
+ for_each_node(node) {
+ struct nvme_ctrl *c;
+ int sum = 0;
+
+ list_for_each_entry(c, &subsys->ctrls, subsys_entry)
+ sum += c->node_map[node];
+ if (sum > max) {
+ max = sum;
+ found_node = node;
+ }
+ }
+ if (found_node != NUMA_NO_NODE) {
+ ctrl->node_map[found_node] = LOCAL_DISTANCE;
+ ctrl->node_map[numa_node] = REMOTE_DISTANCE;
+ }
+}
+
+void nvme_mpath_balance_node(struct nvme_subsystem *subsys,
+ int num_ctrls, int numa_node)
+{
+ struct nvme_ctrl *found = NULL, *ctrl;
+ int max = LOCAL_DISTANCE * num_ctrls, node;
+
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ int sum = 0;
+
+ for_each_node(node)
+ sum += ctrl->node_map[node];
+ if (sum > max) {
+ max = sum;
+ found = ctrl;
+ }
+ }
+ if (found)
+ found->node_map[numa_node] = LOCAL_DISTANCE;
+}
+
void nvme_mpath_balance_subsys(struct nvme_subsystem *subsys)
{
struct nvme_ctrl *ctrl;
+ int num_ctrls = 0;
int node;
mutex_lock(&subsys->lock);
/*
- * Reset set NUMA distance
+ * 1. Reset set NUMA distance
* During creation the NUMA distance is only set
* per controller, so after connecting the other
* controllers the NUMA information on the existing
@@ -280,7 +325,49 @@ void nvme_mpath_balance_subsys(struct nvme_subsystem *subsys)
ctrl->node_map[node] =
node_distance(node, ctrl->numa_node);
}
+ num_ctrls++;
+ }
+
+ /*
+ * 2. Distribute optimal paths:
+ * Only one primary paths per node.
+ * Additional primary paths are moved to unassigned nodes.
+ */
+ for_each_node(node) {
+ bool optimal = false;
+
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ if (ctrl->node_map[node] == LOCAL_DISTANCE) {
+ if (!optimal) {
+ optimal = true;
+ continue;
+ }
+ nvme_mpath_distribute_paths(subsys, num_ctrls,
+ ctrl, node);
+ }
+ }
+ }
+
+ /*
+ * 3. Balance unassigned nodes:
+ * Each unassigned node should have one primary path;
+ * the primary path is assigned to the ctrl with the
+ * minimal weight (ie the sum of distances over all nodes)
+ */
+ for_each_node(node) {
+ bool optimal = false;
+
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ if (ctrl->node_map[node] == LOCAL_DISTANCE) {
+ optimal = true;
+ break;
+ }
+ }
+ if (optimal)
+ continue;
+ nvme_mpath_balance_node(subsys, num_ctrls, node);
}
+
mutex_unlock(&subsys->lock);
}
--
2.16.4
next prev parent reply other threads:[~2018-11-02 9:56 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-02 9:56 [PATCHv3 0/3] nvme: NUMA locality for fabrics Hannes Reinecke
2018-11-02 9:56 ` [PATCH 1/3] nvme: NUMA locality information " Hannes Reinecke
2018-11-08 9:22 ` Christoph Hellwig
2018-11-08 9:35 ` Hannes Reinecke
2018-11-02 9:56 ` [PATCH 2/3] nvme-multipath: Select paths based on NUMA locality Hannes Reinecke
2018-11-08 9:32 ` Christoph Hellwig
2018-11-02 9:56 ` Hannes Reinecke [this message]
2018-11-08 9:36 ` [PATCH 3/3] nvme-multipath: automatic NUMA path balancing Christoph Hellwig
2018-11-16 8:12 ` [PATCHv3 0/3] nvme: NUMA locality for fabrics Christoph Hellwig
2018-11-16 8:21 ` Hannes Reinecke
2018-11-16 8:23 ` Christoph Hellwig
2018-11-19 22:31 ` Sagi Grimberg
2018-11-20 6:12 ` Hannes Reinecke
2018-11-20 9:41 ` Christoph Hellwig
2018-11-20 15:47 ` Keith Busch
2018-11-20 19:27 ` James Smart
2018-11-21 8:36 ` Christoph Hellwig
2018-11-20 16:21 ` Hannes Reinecke
2018-11-20 18:12 ` James Smart
-- strict thread matches above, loose matches on Subject: below --
2018-10-26 12:57 [PATCHv2 " Hannes Reinecke
2018-10-26 12:57 ` [PATCH 3/3] nvme-multipath: automatic NUMA path balancing Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181102095641.28504-4-hare@suse.de \
--to=hare@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.