From: Yury Norov <yury.norov@gmail.com>
To: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Cc: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>,
KY Srinivasan <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
"wei.liu@kernel.org" <wei.liu@kernel.org>,
Dexuan Cui <decui@microsoft.com>,
"davem@davemloft.net" <davem@davemloft.net>,
"edumazet@google.com" <edumazet@google.com>,
"kuba@kernel.org" <kuba@kernel.org>,
"pabeni@redhat.com" <pabeni@redhat.com>,
Long Li <longli@microsoft.com>,
"leon@kernel.org" <leon@kernel.org>,
"cai.huoqing@linux.dev" <cai.huoqing@linux.dev>,
"ssengar@linux.microsoft.com" <ssengar@linux.microsoft.com>,
"vkuznets@redhat.com" <vkuznets@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Paul Rosswurm <paulros@microsoft.com>
Subject: Re: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per CPUs
Date: Tue, 19 Dec 2023 06:03:49 -0800 [thread overview]
Message-ID: <ZYGixTdW4PYF3RjR@yury-ThinkPad> (raw)
In-Reply-To: <PUZP153MB07886CE88351F6B7A2AA0096CC97A@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
On Tue, Dec 19, 2023 at 10:18:49AM +0000, Souradeep Chakrabarti wrote:
>
>
> >-----Original Message-----
> >From: Yury Norov <yury.norov@gmail.com>
> >Sent: Monday, December 18, 2023 3:02 AM
> >To: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>; KY Srinivasan
> ><kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> >wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; davem@davemloft.net;
> >edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Long Li
> ><longli@microsoft.com>; yury.norov@gmail.com; leon@kernel.org;
> >cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> >tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
> >kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> >Cc: Souradeep Chakrabarti <schakrabarti@microsoft.com>; Paul Rosswurm
> ><paulros@microsoft.com>
> >Subject: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per
> >CPUs
> >
> >[Some people who received this message don't often get email from
> >yury.norov@gmail.com. Learn why this is important at
> >https://aka.ms/LearnAboutSenderIdentification ]
> >
> >Souradeep investigated that the driver performs faster if IRQs are spread on CPUs
> >with the following heuristics:
> >
> >1. No more than one IRQ per CPU, if possible; 2. NUMA locality is the second
> >priority; 3. Sibling dislocality is the last priority.
> >
> >Let's consider this topology:
> >
> >Node 0 1
> >Core 0 1 2 3
> >CPU 0 1 2 3 4 5 6 7
> >
> >The most performant IRQ distribution based on the above topology and heuristics
> >may look like this:
> >
> >IRQ Nodes Cores CPUs
> >0 1 0 0-1
> >1 1 1 2-3
> >2 1 0 0-1
> >3 1 1 2-3
> >4 2 2 4-5
> >5 2 3 6-7
> >6 2 2 4-5
> >7 2 3 6-7
> >
> >The irq_setup() routine introduced in this patch leverages the
> >for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups as
> >described above.
> >
> >According to [1], for NUMA-aware but sibling-ignorant IRQ distribution based on
> >cpumask_local_spread() performance test results look like this:
> >
> >./ntttcp -r -m 16
> >NTTTCP for Linux 1.4.0
> >---------------------------------------------------------
> >08:05:20 INFO: 17 threads created
> >08:05:28 INFO: Network activity progressing...
> >08:06:28 INFO: Test run completed.
> >08:06:28 INFO: Test cycle finished.
> >08:06:28 INFO: ##### Totals: #####
> >08:06:28 INFO: test duration :60.00 seconds
> >08:06:28 INFO: total bytes :630292053310
> >08:06:28 INFO: throughput :84.04Gbps
> >08:06:28 INFO: retrans segs :4
> >08:06:28 INFO: cpu cores :192
> >08:06:28 INFO: cpu speed :3799.725MHz
> >08:06:28 INFO: user :0.05%
> >08:06:28 INFO: system :1.60%
> >08:06:28 INFO: idle :96.41%
> >08:06:28 INFO: iowait :0.00%
> >08:06:28 INFO: softirq :1.94%
> >08:06:28 INFO: cycles/byte :2.50
> >08:06:28 INFO: cpu busy (all) :534.41%
> >
> >For NUMA- and sibling-aware IRQ distribution, the same test works 15% faster:
> >
> >./ntttcp -r -m 16
> >NTTTCP for Linux 1.4.0
> >---------------------------------------------------------
> >08:08:51 INFO: 17 threads created
> >08:08:56 INFO: Network activity progressing...
> >08:09:56 INFO: Test run completed.
> >08:09:56 INFO: Test cycle finished.
> >08:09:56 INFO: ##### Totals: #####
> >08:09:56 INFO: test duration :60.00 seconds
> >08:09:56 INFO: total bytes :741966608384
> >08:09:56 INFO: throughput :98.93Gbps
> >08:09:56 INFO: retrans segs :6
> >08:09:56 INFO: cpu cores :192
> >08:09:56 INFO: cpu speed :3799.791MHz
> >08:09:56 INFO: user :0.06%
> >08:09:56 INFO: system :1.81%
> >08:09:56 INFO: idle :96.18%
> >08:09:56 INFO: iowait :0.00%
> >08:09:56 INFO: softirq :1.95%
> >08:09:56 INFO: cycles/byte :2.25
> >08:09:56 INFO: cpu busy (all) :569.22%
> >
> >[1]
> >https://lore.kernel/
> >.org%2Fall%2F20231211063726.GA4977%40linuxonhyperv3.guj3yctzbm1etfxqx2v
> >ob5hsef.xx.internal.cloudapp.net%2F&data=05%7C02%7Cschakrabarti%40micros
> >oft.com%7Ca385a5a5d661458219c208dbff47a7ab%7C72f988bf86f141af91ab2d7
> >cd011db47%7C1%7C0%7C638384455520036393%7CUnknown%7CTWFpbGZsb3d
> >8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> >7C3000%7C%7C%7C&sdata=kzoalzSu6frB0GIaUM5VWsz04%2FsB%2FBdXwXKb26
> >IhqkE%3D&reserved=0
> >
> >Signed-off-by: Yury Norov <yury.norov@gmail.com>
> >Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> >---
> > .../net/ethernet/microsoft/mana/gdma_main.c | 28 +++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >index 6367de0c2c2e..11e64e42e3b2 100644
> >--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >@@ -1243,6 +1243,34 @@ void mana_gd_free_res_map(struct gdma_resource
> >*r)
> > r->size = 0;
> > }
> >
> >+static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int
> >+len, int node) {
> >+ const struct cpumask *next, *prev = cpu_none_mask;
> >+ cpumask_var_t cpus __free(free_cpumask_var);
> >+ int cpu, weight;
> >+
> >+ if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
> >+ return -ENOMEM;
> >+
> >+ rcu_read_lock();
> >+ for_each_numa_hop_mask(next, node) {
> >+ weight = cpumask_weight_andnot(next, prev);
> >+ while (weight-- > 0) {
> Make it while (weight > 0) {
> >+ cpumask_andnot(cpus, next, prev);
> >+ for_each_cpu(cpu, cpus) {
> >+ if (len-- == 0)
> >+ goto done;
> >+ irq_set_affinity_and_hint(*irqs++,
> >topology_sibling_cpumask(cpu));
> >+ cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
> Here do --weight, else this code will traverse the same node N^2 times, where each
> node has N cpus .
Sure.
When building your series on top of this, can you please fix it
inplace?
Thanks,
Yury
> >+ }
> >+ }
> >+ prev = next;
> >+ }
> >+done:
> >+ rcu_read_unlock();
> >+ return 0;
> >+}
> >+
> > static int mana_gd_setup_irqs(struct pci_dev *pdev) {
> > unsigned int max_queues_per_port = num_online_cpus();
> >--
> >2.40.1
next prev parent reply other threads:[~2023-12-19 14:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-17 21:32 [PATCH 0/3] net: mana: add irq_spread() Yury Norov
2023-12-17 21:32 ` [PATCH 1/3] cpumask: add cpumask_weight_andnot() Yury Norov
2023-12-18 21:19 ` Jacob Keller
2023-12-17 21:32 ` [PATCH 2/3] cpumask: define cleanup function for cpumasks Yury Norov
2023-12-17 21:32 ` [PATCH 3/3] net: mana: add a function to spread IRQs per CPUs Yury Norov
2023-12-18 21:17 ` Jacob Keller
2023-12-18 21:42 ` Yury Norov
2023-12-19 7:14 ` [EXTERNAL] " Souradeep Chakrabarti
2023-12-19 10:18 ` Souradeep Chakrabarti
2023-12-19 14:03 ` Yury Norov [this message]
2023-12-18 21:18 ` [PATCH 0/3] net: mana: add irq_spread() Jacob Keller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZYGixTdW4PYF3RjR@yury-ThinkPad \
--to=yury.norov@gmail.com \
--cc=cai.huoqing@linux.dev \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=kuba@kernel.org \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulros@microsoft.com \
--cc=schakrabarti@linux.microsoft.com \
--cc=schakrabarti@microsoft.com \
--cc=ssengar@linux.microsoft.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.