linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Laurent Dufour <ldufour@linux.ibm.com>
To: mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org,
	nathanl@linux.ibm.com, Nick Piggin <npiggin@gmail.com>
Cc: cheloha@linux.ibm.com, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] pseries: prevent free CPU ids to be reused on another node
Date: Tue, 20 Apr 2021 18:34:34 +0200	[thread overview]
Message-ID: <b815f9a6-e4da-8c8e-f207-71c6d122fc40@linux.ibm.com> (raw)
In-Reply-To: <20210407153808.59993-1-ldufour@linux.ibm.com>

Le 07/04/2021 à 17:38, Laurent Dufour a écrit :
> When a CPU is hot added, the CPU ids are taken from the available mask from
> the lower possible set. If that set of values was previously used for CPU
> attached to a different node, this seems to application like if these CPUs
> have migrated from a node to another one which is not expected in real
> life.
> 
> To prevent this, it is needed to record the CPU ids used for each node and
> to not reuse them on another node. However, to prevent CPU hot plug to
> fail, in the case the CPU ids is starved on a node, the capability to reuse
> other nodes’ free CPU ids is kept. A warning is displayed in such a case
> to warn the user.
> 
> A new CPU bit mask (node_recorded_ids_map) is introduced for each possible
> node. It is populated with the CPU onlined at boot time, and then when a
> CPU is hot plug to a node. The bits in that mask remain when the CPU is hot
> unplugged, to remind this CPU ids have been used for this node.
> 
> The effect of this patch can be seen by removing and adding CPUs using the
> Qemu monitor. In the following case, the first CPU from the node 2 is
> removed, then the first one from the node 1 is removed too. Later, the
> first CPU of the node 2 is added back. Without that patch, the kernel will
> numbered these CPUs using the first CPU ids available which are the ones
> freed when removing the second CPU of the node 0. This leads to the CPU ids
> 16-23 to move from the node 1 to the node 2. With the patch applied, the
> CPU ids 32-39 are used since they are the lowest free ones which have not
> been used on another node.
> 
> At boot time:
> [root@vm40 ~]# numactl -H | grep cpus
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> 
> Vanilla kernel, after the CPU hot unplug/plug operations:
> [root@vm40 ~]# numactl -H | grep cpus
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 1 cpus: 24 25 26 27 28 29 30 31
> node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47
> 
> Patched kernel, after the CPU hot unplug/plug operations:
> [root@vm40 ~]# numactl -H | grep cpus
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> node 1 cpus: 24 25 26 27 28 29 30 31
> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> 
> Changes since V3, addressing Nathan's comment:
>   - Rename the local variable named 'nid' into 'assigned_node'
> Changes since V2, addressing Nathan's comments:
>   - Remove the retry feature
>   - Reduce the number of local variables (removing 'i')
>   - Add comment about the cpu_add_remove_lock protecting the added CPU mask.
> Changes since V1 (no functional changes):
>   - update the test's output in the commit's description
>   - node_recorded_ids_map should be static
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>

I did further LPM tests with this patch applied and not allowing fall back 
reusing free ids of another node is too strong.

This is easy to hit that limitation when a LPAR is running at the maximum number 
of CPU it is configured for and when a LPAR migration leads to new node activation.

For instance, consider a dedicated LPAR configured with a max of 32 CPUs (4 
cores SMT 8). At boot time, cpu_possible_mask is filled with CPU ids 0-31 in 
smp_setup_cpu_maps() by reading the DT property "/rtas/ibm,lrdr-capacity", so 
the higher CPU id for this LPAR is 31.

Departure box:
	node 0 : CPU 0-31
Arrival box:
	node 0 : CPU 0-15
	node 1 : CPU 16-31 << need to reuse ids from node 0

Visualizing the CPU ids would have a big impact as it is used in several places 
in the kernel as to index linear table.

But in the case the LPAR is migratable (DT property "ibm,migratable-partition" 
is present), we may set the higher CPU ids to NR_CPUS (usually 2048), to limit 
the case where CPU id has to be reused on a different node. Doing this will have 
impact on some data allocation done in the kernel when the size is based on 
num_possible_cpus.

Any better idea?

Thanks,
Laurent.


      parent reply	other threads:[~2021-04-20 16:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-07 15:38 [PATCH v4] pseries: prevent free CPU ids to be reused on another node Laurent Dufour
2021-04-07 19:48 ` Nathan Lynch
2021-04-20 16:34 ` Laurent Dufour [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b815f9a6-e4da-8c8e-f207-71c6d122fc40@linux.ibm.com \
    --to=ldufour@linux.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=cheloha@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).