public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Cc: hch@lst.de, kbusch@kernel.org, linux-kernel@vger.kernel.org,
	maz@kernel.org, tglx@linutronix.de
Subject: Re: [PATCH] irq: consider cpus on nodes are unbalanced
Date: Fri, 17 Dec 2021 14:57:12 +0800	[thread overview]
Message-ID: <Ybw0yPhi01lro7m2@T590> (raw)
In-Reply-To: <20211217024805.303738-1-yamamoto.rei@jp.fujitsu.com>

On Fri, Dec 17, 2021 at 11:48:05AM +0900, Rei Yamamoto wrote:
> On Wed, Dec 15, 2021 at 12:33, Ming Lei wrote:
> >> >> If cpus on a node are offline at boot time, there are
> >> >> difference in the number of nodes between when building affinity
> >> >> masks for present cpus and when building affinity masks for possible
> >> >> cpus.
> >
> > There is always difference between the two number of nodes, the 1st is
> > node number covering present cpus, and the 2nd one is the node number
> > covering other possible cpus not spread.
> 
> In this case, building affinity masks for possible cpus would change even
> the affinity mask bits for present cpus in the "if (numvecs <= nodes)" route.
> This is the second problem I mentioned.
> I will explain about the actual case later.
> 
> >
> >>> This patch fixes 2 problems caused by the difference of the
> >
> > Is there any user visible problem?
> 
> The panic occured in lpfc driver.
> 
> >
> >> >> number of nodes:
> >> >>
> >> >>  - If some unused vectors remain after building masks for present cpus,
> >
> > We just select a new vector for starting the spread if un-allocated
> > vectors remains, but the number for allocation is still numvecs. We hope both
> > present cpus and non-present cpus can be balanced on each vector, so that each
> > vector may get present cpu allocated.
> 
> I understood.
> I withdraw the first problem I mentioned.
> 
> >
> >> >>    remained vectors are assigned for building masks for possible cpus.
> >> >>    Therefore "numvecs <= nodes" condition must be
> >> >>    "vecs_to_assign <= nodes_to_assign". Fix this problem by making this
> >> >>    condition appropriate.
> >> >>
> >> >>  - The routine of "numvecs <= nodes" condition can overwrite bits of
> >> >>    masks for present cpus in building masks for possible cpus. Fix this
> >> >>    problem by making CPU bits, which is not target, not changing.
> >
> > 'numvecs' is always the total number of vectors for assigning CPUs, if
> > the number is <= nodes, we just assign interested cpus in the whole
> > node into each vector until all interested cpus are allocated out.
> >
> >
> >> Do you have any comments?
> >
> > Not see issues in current way, or can you explain a bit the real
> > user visible problem in details?
> 
> I experienced a panic occurred in lpfc driver with broken affinity masks.
> 
> The system had the following configuration:
> -----
> node num: cpu num
> Node #0: #0 #1 (#4 #8 #12)
> Node #1: #2 #3 (#5 #9 #13)
> Node #2: (#6 #10 #14)
> Node #3: (#7 #11 #15)
> 
> Number of CPUs: 16
> Present CPU: cpu0, cpu1, cpu2, cpu3
> Number of nodes covering present cpus: 2
> Number of nodes covering possible cpus: 4
> Number of vectors: 4
> -----
> 
> Due to the configuration above, cpumask_var_t *node_to_cpumask was as follows:
> -----
> node_to_cpumask[0] = 0x1113
> node_to_cpumask[1] = 0x222c
> node_to_cpumask[2] = 0x4440
> node_to_cpumask[3] = 0x8880
> -----
> 
> As the result of assigning vertors for present cpus, masks[].mask were as follows:
> -----
> masks[vec1].mask = 0x0004
> masks[vec2].mask = 0x0008
> masks[vec3].mask = 0x0001
> masks[vec4].mask = 0x0002
> -----
> 
> As the result of assigning vertors for possible cpus, masks[].mask were as follows:
> -----
> masks[vec1].mask = 0x1117
> masks[vec2].mask = 0x222c
> masks[vec3].mask = 0x4441
> masks[vec4].mask = 0x8882
> -----
> 
> The problem I encountered was that multiple vectors were assigned for
> a single present cpu unexpectedly.
> For example, vec1 and vec3 were assigned to cpu0.
> Due to this mask, the panic occured in lpfc driver.

OK, I can understand the issue now, and only the following part is enough
since nmsk won't be empty:


diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f7ff8919dc9b..d2d01565d2ec 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -269,8 +269,9 @@ static int __irq_build_affinity_masks(unsigned int startvec,
 	 */
 	if (numvecs <= nodes) {
 		for_each_node_mask(n, nodemsk) {
+			cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]);
 			cpumask_or(&masks[curvec].mask, &masks[curvec].mask,
-				   node_to_cpumask[n]);
+				   nmsk);
 			if (++curvec == last_affv)
 				curvec = firstvec;
 		}

Thanks,
Ming


  reply	other threads:[~2021-12-17  6:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-29  8:27 [PATCH] irq: consider cpus on nodes are unbalanced Rei Yamamoto
2021-11-24 19:33 ` Thomas Gleixner
2021-12-15  1:57   ` Rei Yamamoto
2021-12-15  4:33     ` Ming Lei
2021-12-17  2:48       ` Rei Yamamoto
2021-12-17  6:57         ` Ming Lei [this message]
2021-12-17  7:12           ` Rei Yamamoto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ybw0yPhi01lro7m2@T590 \
    --to=ming.lei@redhat.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=yamamoto.rei@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox