From: hare@suse.de (Hannes Reinecke)
Subject: [PATCH 3/3] nvme-multipath: automatic NUMA path balancing
Date: Fri, 26 Oct 2018 14:57:18 +0200 [thread overview]
Message-ID: <20181026125718.122767-4-hare@suse.de> (raw)
In-Reply-To: <20181026125718.122767-1-hare@suse.de>
In order to utilize both paths on dual-ported HBAs we cannot rely
on the NUMA affinity alone, but rather have to distribute the
locality information to get the best possible result.
This patch implements a two-pass algorithm for assinging NUMA
locality information:
1. Distribute existing locality information so that no core has
more than one 'local' controller
2. Assign a 'local' controller for each of the remaining cores,
so that the overall weight (ie the sum of all locality information)
per ctrl is minimal.
Signed-off-by: Hannes Reinecke <hare at suse.com>
---
drivers/nvme/host/multipath.c | 111 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index a589a1a7b6ce..9e4183401539 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -262,6 +262,115 @@ static void nvme_requeue_work(struct work_struct *work)
}
}
+void nvme_mpath_distribute_paths(struct nvme_subsystem *subsys, int num_ctrls,
+ struct nvme_ctrl *ctrl, int node_id)
+{
+ int node;
+ int found_node = NUMA_NO_NODE;
+ int max = LOCAL_DISTANCE * num_ctrls;
+
+ for_each_node(node) {
+ struct nvme_ctrl *c;
+ int sum = 0;
+
+ list_for_each_entry(c, &subsys->ctrls, subsys_entry)
+ sum += c->node_map[node];
+ if (sum > max) {
+ max = sum;
+ found_node = node;
+ }
+ }
+ if (found_node != NUMA_NO_NODE) {
+ ctrl->node_map[found_node] = LOCAL_DISTANCE;
+ ctrl->node_map[node_id] = REMOTE_DISTANCE;
+ }
+}
+
+void nvme_mpath_balance_node(struct nvme_subsystem *subsys,
+ int num_ctrls, int node_id)
+{
+ struct nvme_ctrl *found = NULL, *ctrl;
+ int max = LOCAL_DISTANCE * num_ctrls, node;
+
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ int sum = 0;
+
+ for_each_node(node)
+ sum += ctrl->node_map[node];
+ if (sum > max) {
+ max = sum;
+ found = ctrl;
+ }
+ }
+ if (found)
+ found->node_map[node_id] = LOCAL_DISTANCE;
+}
+
+void nvme_mpath_balance_subsys(struct nvme_subsystem *subsys)
+{
+ struct nvme_ctrl *ctrl;
+ int num_ctrls = 0;
+ int node;
+
+ mutex_lock(&subsys->lock);
+
+ /*
+ * 1. Reset set NUMA distance
+ * During creation the NUMA distance is only set
+ * per controller, so after connecting the other
+ * controllers the NUMA information on the existing
+ * ones is incorrect.
+ */
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ for_each_node(node)
+ ctrl->node_map[node] =
+ node_distance(node, ctrl->node_id);
+ num_ctrls++;
+ }
+
+ /*
+ * 2. Distribute optimal paths:
+ * Only one primary paths per node.
+ * Additional primary paths are moved to unassigned nodes.
+ */
+ for_each_node(node) {
+ bool optimal = false;
+
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ if (ctrl->node_map[node] == LOCAL_DISTANCE) {
+ if (!optimal) {
+ optimal = true;
+ continue;
+ }
+ nvme_mpath_distribute_paths(subsys, num_ctrls,
+ ctrl, node);
+ }
+ }
+ }
+
+ /*
+ * 3. Balance unassigned nodes:
+ * Each unassigned node should have one primary path;
+ * the primary path is assigned to the ctrl with the
+ * minimal weight (ie the sum of distances over all nodes)
+ */
+ for_each_node(node) {
+ bool optimal = false;
+
+ list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+ if (ctrl->node_map[node] == LOCAL_DISTANCE) {
+ optimal = true;
+ break;
+ }
+ }
+ if (optimal)
+ continue;
+ nvme_mpath_balance_node(subsys, num_ctrls, node);
+ }
+
+ mutex_unlock(&subsys->lock);
+}
+
int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
{
struct request_queue *q;
@@ -553,6 +662,8 @@ int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
{
int error;
+ nvme_mpath_balance_subsys(ctrl->subsys);
+
if (!nvme_ctrl_use_ana(ctrl))
return 0;
--
2.16.4
next prev parent reply other threads:[~2018-10-26 12:57 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-26 12:57 [PATCHv2 0/3] nvme: NUMA locality for fabrics Hannes Reinecke
2018-10-26 12:57 ` [PATCH 1/3] nvme: NUMA locality information " Hannes Reinecke
2018-10-30 18:35 ` Sagi Grimberg
2018-10-26 12:57 ` [PATCH 2/3] nvme-multipath: Select paths based on NUMA locality Hannes Reinecke
2018-10-30 18:39 ` Sagi Grimberg
2018-10-26 12:57 ` Hannes Reinecke [this message]
-- strict thread matches above, loose matches on Subject: below --
2018-11-02 9:56 [PATCHv3 0/3] nvme: NUMA locality for fabrics Hannes Reinecke
2018-11-02 9:56 ` [PATCH 3/3] nvme-multipath: automatic NUMA path balancing Hannes Reinecke
2018-11-08 9:36 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181026125718.122767-4-hare@suse.de \
--to=hare@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).