Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yury Norov <yury.norov@gmail.com>
To: Michael Kelley <mhklinux@outlook.com>
Cc: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>,
	"kys@microsoft.com" <kys@microsoft.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"decui@microsoft.com" <decui@microsoft.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"edumazet@google.com" <edumazet@google.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"longli@microsoft.com" <longli@microsoft.com>,
	"leon@kernel.org" <leon@kernel.org>,
	"cai.huoqing@linux.dev" <cai.huoqing@linux.dev>,
	"ssengar@linux.microsoft.com" <ssengar@linux.microsoft.com>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"schakrabarti@microsoft.com" <schakrabarti@microsoft.com>,
	"paulros@microsoft.com" <paulros@microsoft.com>
Subject: Re: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs
Date: Tue, 9 Jan 2024 15:28:59 -0800	[thread overview]
Message-ID: <ZZ3Wsxq8rHShTUdA@yury-ThinkPad> (raw)
In-Reply-To: <SN6PR02MB4157CB3CB55A17255AE61BF6D46A2@SN6PR02MB4157.namprd02.prod.outlook.com>

Hi Michael,

So, I'm just a guy who helped to formulate the heuristics in an
itemized form, and implement them using the existing kernel API.
I have no access to MANA machines and I ran no performance tests
myself.

On Tue, Jan 09, 2024 at 07:22:38PM +0000, Michael Kelley wrote:
> From: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Sent: Tuesday, January 9, 2024 2:51 AM
> > 
> > From: Yury Norov <yury.norov@gmail.com>
> > 
> > Souradeep investigated that the driver performs faster if IRQs are
> > spread on CPUs with the following heuristics:
> > 
> > 1. No more than one IRQ per CPU, if possible;
> > 2. NUMA locality is the second priority;
> > 3. Sibling dislocality is the last priority.
> > 
> > Let's consider this topology:
> > 
> > Node            0               1
> > Core        0       1       2       3
> > CPU       0   1   2   3   4   5   6   7
> > 
> > The most performant IRQ distribution based on the above topology
> > and heuristics may look like this:
> > 
> > IRQ     Nodes   Cores   CPUs
> > 0       1       0       0-1
> > 1       1       1       2-3
> > 2       1       0       0-1
> > 3       1       1       2-3
> > 4       2       2       4-5
> > 5       2       3       6-7
> > 6       2       2       4-5
> > 7       2       3       6-7
> 
> I didn't pay attention to the detailed discussion of this issue
> over the past 2 to 3 weeks during the holidays in the U.S., but
> the above doesn't align with the original problem as I understood
> it.  I thought the original problem was to avoid putting IRQs on
> both hyper-threads in the same core, and that the perf
> improvements are based on that configuration.  At least that's
> what the commit message for Patch 4/4 in this series says.

Yes, and the original distribution suggested by Souradeep looks very
similar:

  IRQ     Nodes   Cores   CPUs
  0       1       0       0
  1       1       1       2
  2       1       0       1
  3       1       1       3
  4       2       2       4
  5       2       3       6
  6       2       2       5
  7       2       3       7

I just added a bit more flexibility, so that kernel may pick any
sibling for the IRQ. As I understand, both approaches have similar
performance. Probably my fine-tune added another half-percent...

Souradeep, can you please share the exact numbers on this?

> The above chart results in 8 IRQs being assigned to the 8 CPUs,
> probably with 1 IRQ per CPU.   At least on x86, if the affinity
> mask for an IRQ contains multiple CPUs, matrix_find_best_cpu()
> should balance the IRQ assignments between the CPUs in the mask.
> So the original problem is still present because both hyper-threads
> in a core are likely to have an IRQ assigned.

That's what I think, if the topology makes us to put IRQs in the
same sibling group, the best thing we can to is to rely on existing
balancing mechanisms in a hope that they will do their job well.

> Of course, this example has 8 IRQs and 8 CPUs, so assigning an
> IRQ to every hyper-thread may be the only choice.  If that's the
> case, maybe this just isn't a good example to illustrate the
> original problem and solution.

Yeah... This example illustrates the order of IRQ distribution.
I really doubt that if we distribute IRQs like in the above example,
there would be any difference in performance. But I think it's quite
a good illustration. I could write the title for the table like this:

        The order of IRQ distribution for the best performance
        based on [...] may look like this.

> But even with a better example
> where the # of IRQs is <= half the # of CPUs in a NUMA node,
> I don't think the code below accomplishes the original intent.
> 
> Maybe I've missed something along the way in getting to this
> version of the patch.  Please feel free to set me straight. :-)

Hmm. So if the number of IRQs is the half # of CPUs in the nodes,
which is 2 in the example above, the distribution will look like
this:

  IRQ     Nodes   Cores   CPUs
  0       1       0       0-1
  1       1       1       2-3

And each IRQ belongs to a different sibling group. This follows
the rules above.

I think of it like we assign an IRQ to a group of 2 CPUs, so from
the heuristic #1 perspective, each CPU is assigned with 1/2 of the
IRQ.

If I add one more IRQ, then according to the heuristics, NUMA locality
trumps sibling dislocality, so we'd assign IRO to the same node on any
core. My algorithm assigns it to the core #0:

  2       1       0       0-1

This doubles # of IRQs for the CPUs 0 and 1: from 1/2 to 1.

The next IRQ should be assigned to the same node again, and we've got
the only choice:


  3       1       1       2-3

Starting from IRQ #5, the node #1 is full - each CPU is assigned with
exactly one IRQ, and the heuristic #1 makes us to switch to the other
node; and then do the same thing:

  4       2       2       4-5
  5       2       3       6-7
  6       2       2       4-5
  7       2       3       6-7

So I think the algorithm is correct... Really hope the above makes
sense. :) If so, I can add it to the commit message for patch #3.

Nevertheless... Souradeep, in addition to the performance numbers, can
you share your topology and actual IRQ distribution that gains 15%? I
think it should be added to the patch #4 commit message.

Thanks,
Yury

next prev parent reply	other threads:[~2024-01-09 23:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-09 10:51 [PATCH 0/4 net-next] net: mana: Assigning IRQ affinity on HT cores Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 1/4 net-next] cpumask: add cpumask_weight_andnot() Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 2/4 net-next] cpumask: define cleanup function for cpumasks Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs Souradeep Chakrabarti
2024-01-09 19:22   ` Michael Kelley
2024-01-09 20:20     ` Haiyang Zhang
2024-01-10  9:08       ` Souradeep Chakrabarti
2024-01-09 23:28     ` Yury Norov [this message]
2024-01-10  0:27       ` Michael Kelley
2024-01-11  6:13         ` Souradeep Chakrabarti
2024-01-12 16:36           ` Michael Kelley
2024-01-12 18:30             ` Haiyang Zhang
2024-01-13  6:30               ` Souradeep Chakrabarti
2024-01-13 16:20                 ` Michael Kelley
2024-01-13 19:11                   ` Yury Norov
2024-01-16  6:13                     ` Souradeep Chakrabarti
2024-01-10  9:09       ` Souradeep Chakrabarti
2024-01-09 10:51 ` [PATCH 4/4 net-next] net: mana: Assigning IRQ affinity on HT cores Souradeep Chakrabarti
2024-01-09 11:57 ` [PATCH 0/4 " Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZ3Wsxq8rHShTUdA@yury-ThinkPad \
    --to=yury.norov@gmail.com \
    --cc=cai.huoqing@linux.dev \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=leon@kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulros@microsoft.com \
    --cc=schakrabarti@linux.microsoft.com \
    --cc=schakrabarti@microsoft.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).