[PATCH v4 3/5] net: mana: explain irq_setup() algorithm

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Shradha Gupta <shradhagupta@linux.microsoft.com>
To: Dexuan Cui <decui@microsoft.com>, Wei Liu <wei.liu@kernel.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Konstantin Taranov <kotaranov@microsoft.com>,
	Simon Horman <horms@kernel.org>,
	Leon Romanovsky <leon@kernel.org>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Erni Sri Satya Vennela <ernis@linux.microsoft.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michael Kelley <mhklinux@outlook.com>
Cc: "Shradha Gupta" <shradhagupta@linux.microsoft.com>,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, "Nipun Gupta" <nipun.gupta@amd.com>,
	"Yury Norov" <yury.norov@gmail.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jonathan Cameron" <Jonathan.Cameron@huwei.com>,
	"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Long Li" <longli@microsoft.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Rob Herring" <robh@kernel.org>,
	"Manivannan Sadhasivam" <manivannan.sadhasivam@linaro.org>,
	"Krzysztof Wilczy�~Dski" <kw@linux.com>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	"Paul Rosswurm" <paulros@microsoft.com>,
	"Shradha Gupta" <shradhagupta@microsoft.com>
Subject: [PATCH v4 3/5] net: mana: explain irq_setup() algorithm
Date: Tue, 27 May 2025 08:58:25 -0700	[thread overview]
Message-ID: <1748361505-25513-1-git-send-email-shradhagupta@linux.microsoft.com> (raw)
In-Reply-To: <1748361453-25096-1-git-send-email-shradhagupta@linux.microsoft.com>

Commit 91bfe210e196 ("net: mana: add a function to spread IRQs per CPUs")
added the irq_setup() function that distributes IRQs on CPUs according
to a tricky heuristic. The corresponding commit message explains the
heuristic.

Duplicate it in the source code to make available for readers without
digging git in history. Also, add more detailed explanation about how
the heuristics is implemented.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 41 +++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 4ffaf7588885..f9e8d4d1ba3a 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1288,6 +1288,47 @@ void mana_gd_free_res_map(struct gdma_resource *r)
 	r->size = 0;
 }
 
+/*
+ * Spread on CPUs with the following heuristics:
+ *
+ * 1. No more than one IRQ per CPU, if possible;
+ * 2. NUMA locality is the second priority;
+ * 3. Sibling dislocality is the last priority.
+ *
+ * Let's consider this topology:
+ *
+ * Node            0               1
+ * Core        0       1       2       3
+ * CPU       0   1   2   3   4   5   6   7
+ *
+ * The most performant IRQ distribution based on the above topology
+ * and heuristics may look like this:
+ *
+ * IRQ     Nodes   Cores   CPUs
+ * 0       1       0       0-1
+ * 1       1       1       2-3
+ * 2       1       0       0-1
+ * 3       1       1       2-3
+ * 4       2       2       4-5
+ * 5       2       3       6-7
+ * 6       2       2       4-5
+ * 7       2       3       6-7
+ *
+ * The heuristics is implemented as follows.
+ *
+ * The outer for_each() loop resets the 'weight' to the actual number
+ * of CPUs in the hop. Then inner for_each() loop decrements it by the
+ * number of sibling groups (cores) while assigning first set of IRQs
+ * to each group. IRQs 0 and 1 above are distributed this way.
+ *
+ * Now, because NUMA locality is more important, we should walk the
+ * same set of siblings and assign 2nd set of IRQs (2 and 3), and it's
+ * implemented by the medium while() loop. We do like this unless the
+ * number of IRQs assigned on this hop will not become equal to number
+ * of CPUs in the hop (weight == 0). Then we switch to the next hop and
+ * do the same thing.
+ */
+
 static int irq_setup(unsigned int *irqs, unsigned int len, int node)
 {
 	const struct cpumask *next, *prev = cpu_none_mask;
-- 
2.34.1

next prev parent reply	other threads:[~2025-05-27 15:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-27 15:57 [PATCH v4 0/5] Allow dyn MSI-X vector allocation of MANA Shradha Gupta
2025-05-27 15:57 ` [PATCH v4 1/5] PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations Shradha Gupta
2025-05-29  3:46   ` Saurabh Singh Sengar
2025-05-27 15:58 ` [PATCH v4 2/5] PCI: hv: Allow dynamic MSI-X vector allocation Shradha Gupta
2025-05-29  3:46   ` Saurabh Singh Sengar
2025-05-27 15:58 ` Shradha Gupta [this message]
2025-05-27 19:10   ` [PATCH v4 3/5] net: mana: explain irq_setup() algorithm Yury Norov
2025-05-29 13:15     ` Shradha Gupta
2025-05-27 15:58 ` [PATCH v4 4/5] net: mana: Allow irq_setup() to skip cpus for affinity Shradha Gupta
2025-05-27 15:59 ` [PATCH v4 5/5] net: mana: Allocate MSI-X vectors dynamically Shradha Gupta
2025-05-28  8:16   ` Saurabh Singh Sengar
2025-05-29 13:17     ` Shradha Gupta
2025-05-28 18:52   ` Simon Horman
2025-05-29 13:18     ` Shradha Gupta
2025-05-29  3:45   ` Saurabh Singh Sengar
2025-05-29 13:20     ` Shradha Gupta
2025-05-28 18:55 ` [PATCH v4 0/5] Allow dyn MSI-X vector allocation of MANA Simon Horman
2025-05-29 13:28   ` Shradha Gupta
2025-05-30 18:07     ` Simon Horman
2025-06-03  4:15       ` Shradha Gupta
2025-06-01 14:53 ` Zhu Yanjun
2025-06-03  4:17   ` Shradha Gupta

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4ffaf758888 dfblob:f9e8d4d1ba3 )
 OR (
bs:"[PATCH v4 3/5] net: mana: explain irq_setup() algorithm" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1748361505-25513-1-git-send-email-shradhagupta@linux.microsoft.com \
    --to=shradhagupta@linux.microsoft.com \
    --cc=Jonathan.Cameron@huwei.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anna-maria@linutronix.de \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=ernis@linux.microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=horms@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kevin.tian@intel.com \
    --cc=kotaranov@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kw@linux.com \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=lpieralisi@kernel.org \
    --cc=manivannan.sadhasivam@linaro.org \
    --cc=mhklinux@outlook.com \
    --cc=mlevitsk@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nipun.gupta@amd.com \
    --cc=pabeni@redhat.com \
    --cc=paulros@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=robh@kernel.org \
    --cc=shradhagupta@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=wei.liu@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).