From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>, linuxppc-dev@lists.ozlabs.org
Subject: Re: [4/6] pseries: Add CPU dlpar remove functionality
Date: Tue, 21 Jul 2015 16:34:11 -0500 [thread overview]
Message-ID: <55AEBAD3.6080601@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150721092732.B800B140E0F@ozlabs.org>
On 07/21/2015 04:27 AM, Michael Ellerman wrote:
> On Mon, 2015-22-06 at 21:00:49 UTC, Nathan Fontenot wrote:
>> Add the ability to dlpar remove CPUs via hotplug rtas events, either by
>> specifying the drc-index of the CPU to remove or providing a count of cpus
>> to remove.
>>
>> To accomplish we create a list of possible dr cpus and their drc indexes
>
> What does "dr" mean? I know but not everyone does. If it's an acronymn it
> should be uppercase and spelt out at its first usage.
>
Good point.
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 7890b2f..49b7196 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -506,6 +507,207 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
>> return rc;
>> }
>>
>> +static struct device_node *cpu_drc_index_to_dn(struct device_node *parent,
>> + u32 drc_index)
>> +{
>> + struct device_node *dn;
>> + u32 my_index;
>> + int rc;
>> +
>> + for_each_child_of_node(parent, dn) {
>> + if (of_node_cmp(dn->type, "cpu") != 0)
>> + continue;
>
> parent is always "/cpus", so I wonder if this should just be:
>
> for_each_node_by_type(dn, "cpu") {
>
> And then you don't need parent in here at all.
>
> Or do we realistically expect to be looking for cpus under a node other than "/cpus" ?
This, combined with the comments below about iterating over the list
of dr_cpus, is going away in the next version of the patch. The 'parent'
node pointer is not passed around anymore.
>
>> + rc = of_property_read_u32(dn, "ibm,my-drc-index", &my_index);
>> + if (rc)
>> + continue;
>> +
>> + if (my_index == drc_index)
>> + break;
>> + }
>> +
>> + return dn;
>> +}
>> +
>> +static int dlpar_cpu_remove_by_index(struct device_node *parent,
>> + u32 drc_index)
>> +{
>> + struct device_node *dn;
>> + int rc;
>> +
>> + dn = cpu_drc_index_to_dn(parent, drc_index);
>> + if (!dn)
>> + return -ENODEV;
>> +
>> + rc = dlpar_cpu_remove(dn, drc_index);
>> + of_node_put(dn);
>> + return rc;
>> +}
>> +
>> +static int dlpar_cpus_possible(struct device_node *parent)
>> +{
>> + int dr_cpus_possible;
>> +
>> + /* The first u32 in the ibm,drc-indexes property is the numnber
>> + * of drc entries in the property, which is the possible number
>> + * number of dr capable cpus.
>> + */
>> + of_property_read_u32(parent, "ibm,drc-indexes", &dr_cpus_possible);
>> + return dr_cpus_possible;
>> +}
>> +
>> +struct dr_cpu {
>> + u32 drc_index;
>> + bool present;
>> + bool modified;
>> +};
>> +
>> +static struct dr_cpu *get_dlpar_cpus(struct device_node *parent)
>> +{
>> + struct device_node *dn;
>> + struct property *prop;
>> + struct dr_cpu *dr_cpus;
>> + const __be32 *p;
>> + u32 drc_index;
>> + int dr_cpus_possible, index;
>> + bool first;
>> +
>> + dr_cpus_possible = dlpar_cpus_possible(parent);
>> + dr_cpus = kcalloc(dr_cpus_possible, sizeof(*dr_cpus), GFP_KERNEL);
>> + if (!dr_cpus)
>> + return NULL;
>> +
>> + first = true;
>> + index = 0;
>> + of_property_for_each_u32(parent, "ibm,drc-indexes", prop, p,
>> + drc_index) {
>> + if (first) {
>
> What about:
>
> if (index == 0) {
>
> Then you don't need first at all?
>
This block of code is getting re-written, no need to look for the
first node in the new version.
>> + /* The first entry is the number of drc indexes in
>> + * the property, skip it.
>> + */
>> + first = false;
>> + continue;
>> + }
>> +
>> + dr_cpus[index].drc_index = drc_index;
>> +
>> + dn = cpu_drc_index_to_dn(parent, drc_index);
>
> The outer loop is iterating over all drc indexes (ie. one per cpu usually?),
> and then cpu_drc_index_to_dn() will iterate over all cpus.
>
> So that's n^2 for n = nr_cpus which is not ideal.
>
> I realise we can't do much better, but what we can do is limit the number of
> iterations of the outer loop. Because usually we will be here because we want
> to offline less than nr_cpus cpus.
>
> ie. if I ask to offline 1 cpu on a 4096 cpu system we will still do up to 16M
> iterations in here, when all we needed to do was one iteration of the outer
> loop.
>
> So I think this routine should take the number of cpus we're trying to remove
> and only iterate until it's found that many.
>
Yeah, now that you point it out there is a lot of no good in that code.
Re-writing this with a new approach to only create lists of what is present.
>> + if (dn) {
>> + dr_cpus[index].present = true;
>> + of_node_put(dn);
>> + }
>
> Why do we put non present ones in the dr_cpus array at all? It just means you
> have to skip them later? (in two separate places)
>
>> +
>> + index++;
>> + }
>> +
>> + return dr_cpus;
>> +}
>> +
>> +static int dlpar_cpu_remove_by_count(struct device_node *parent,
>> + u32 cpus_to_remove)
>> +{
>> + struct dr_cpu *dr_cpus;
>> + int dr_cpus_removed = 0;
>> + int dr_cpus_present = 0;
>> + int dr_cpus_possible;
>> + int i, rc;
>> +
>> + pr_info("Attempting to hot-remove %d CPUs\n", cpus_to_remove);
>> +
>> + dr_cpus = get_dlpar_cpus(parent);
>
> So I think this should be:
>
>> + dr_cpus = get_dlpar_cpus(parent, cpus_to_remove);
>
>> + if (!dr_cpus) {
>> + pr_info("Could not gather dr CPU info\n");
>> + return -EINVAL;
>> + }
>
> And get_dlpar_cpus() should return cpus_to_remove worth of present cpus, or it
> should error, meaning the below then won't be needed:
>
When adding cpus by count we may need more than just cpus_to_remove worth
of present cpus. The goal was to provide all possibilities so we could
continue trying to satisfy the request even if one or more cpus fails
to remove.
>From this comment and comments below I think your approach is that we
should bail if any error occurs during cpu remove. Is this what we
should be doing?
>> + dr_cpus_possible = dlpar_cpus_possible(parent);
>> +
>> + for (i = 0; i < dr_cpus_possible; i++) {
>> + if (dr_cpus[i].present)
>> + dr_cpus_present++;
>> + }
>> +
>> + /* Validate the available CPUs to remove.
>> + * NOTE: we can't remove the last CPU.
>> + */
>> + if (cpus_to_remove >= dr_cpus_present) {
>> + pr_err("Insufficient CPUs (%d) to satisfy remove request\n",
>> + dr_cpus_present);
>> + kfree(dr_cpus);
>> + return -EINVAL;
>> + }
>> +
>
> Having only present cpus in dr_cpus should also simplify this loop because you
> don't need to skip non-present ones:
Fixed in updated code.
>
>> + for (i = 0; i < dr_cpus_possible; i++) {
>> + if (dr_cpus_removed == cpus_to_remove)
>> + break;
>> +
>> + if (!dr_cpus[i].present)
>> + continue;
>> +
>> + rc = dlpar_cpu_remove_by_index(parent, dr_cpus[i].drc_index);
>> + if (!rc) {
>> + dr_cpus_removed++;
>> + dr_cpus[i].modified = true;
>> + }
>
> else .. ?
>
> Surely you should bail on the first error?
>
> Definitely you should, because you're going to undo everything below anyway:
See comment above, I kept going here in the hopes that we could satisfy the
request even if a failure occurred.
>
>> + }
>> +
>> + if (dr_cpus_removed != cpus_to_remove) {
>> + pr_info("CPU hot-remove failed, adding any removed CPUs\n");
>> +
>> + for (i = 0; i < dr_cpus_possible; i++) {
>> + if (!dr_cpus[i].modified)
>> + continue;
>
> And if you bail on the first error above you shouldn't need modified, instead
> you just iterate in reverse from i using a new counter, eg. j.
Yep, I'll work that into the new code.
>
>> +
>> + rc = dlpar_cpu_add(parent, dr_cpus[i].drc_index);
>> + if (rc)
>> + pr_info("Failed to re-add CPU (%x)\n",
>> + dr_cpus[i].drc_index);
>> + }
>> +
>> + rc = -EINVAL;
>> + } else {
>> + rc = 0;
>> + }
>> +
>> + kfree(dr_cpus);
>> + return rc;
>> +}
>> +
>> +int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>> +{
>> + struct device_node *parent;
>> + u32 count, drc_index;
>> + int rc;
>> +
>> + count = hp_elog->_drc_u.drc_count;
>> + drc_index = hp_elog->_drc_u.drc_index;
>
> Those both need be32_to_cpu().
>
> arch/powerpc/platforms/pseries/hotplug-cpu.c:750:15: warning: incorrect type in assignment (different base types)
> arch/powerpc/platforms/pseries/hotplug-cpu.c:750:15: expected unsigned int [unsigned] [usertype] count
> arch/powerpc/platforms/pseries/hotplug-cpu.c:750:15: got restricted __be32 [usertype] drc_count
> arch/powerpc/platforms/pseries/hotplug-cpu.c:751:19: warning: incorrect type in assignment (different base types)
> arch/powerpc/platforms/pseries/hotplug-cpu.c:751:19: expected unsigned int [unsigned] [usertype] drc_index
> arch/powerpc/platforms/pseries/hotplug-cpu.c:751:19: got restricted __be32 [usertype] drc_index
>
Will fix.
>
> Though looking closer I don't see where we ever pass or receive a
> pseries_hp_errorlog to or from firmware? So I'm a bit confused why we're
> bothering with the __be32 shenanigans. Hopefully I've just missed a detail
> somewhere.
>
That patch is coming. For hotplug in KVM guests the pseries_hp_errorlog
is received when we call rtas_check_exception(). Currently these are
sent up to rtas_errd in userspace, When this partchset goes in I planned
on sending a patch to have cpu and memory hotplug requests handled entirely
in the kernel instead of going to userspace.
>> + parent = of_find_node_by_path("/cpus");
>> + if (!parent)
>> + return -ENODEV;
>> +
>> + lock_device_hotplug();
>> +
>> + switch (hp_elog->action) {
>> + case PSERIES_HP_ELOG_ACTION_REMOVE:
>> + if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
>> + rc = dlpar_cpu_remove_by_count(parent, count);
>> + else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
>> + rc = dlpar_cpu_remove_by_index(parent, drc_index);
>> + else
>> + rc = -EINVAL;
>> + break;
>> + default:
>> + pr_err("Invalid action (%d) specified\n", hp_elog->action);
>> + rc = -EINVAL;
>> + break;
>> + }
>> +
>> + unlock_device_hotplug();
>> + of_node_put(parent);
>> + return rc;
>> +}
>> +
>> #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
>>
>> static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
>> diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
>> index 8411c27..58892f1 100644
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -66,11 +66,16 @@ extern int dlpar_release_drc(u32 drc_index);
>>
>> #ifdef CONFIG_MEMORY_HOTPLUG
>> int dlpar_memory(struct pseries_hp_errorlog *hp_elog);
>> +int dlpar_cpu(struct pseries_hp_errorlog *hp_elog);
>
> I don't think this should be under CONFIG_MEMORY_HOTPLUG ?
>
No it should not.
Thanks for the review.
-Nathan
next prev parent reply other threads:[~2015-07-21 21:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-22 20:54 [PATCH 0/6] pseries: Move CPU dlpar into the kernel Nathan Fontenot
2015-06-22 20:56 ` [PATCH 1/6] pseries: Consolidate CPU hotplug code to hotplug-cpu.c Nathan Fontenot
2015-06-22 20:58 ` [PATCH 2/6] pseries: Factor out common cpu hotplug code Nathan Fontenot
2015-06-22 20:59 ` [PATCH 3/6] pseries: Update CPU hotplug error recovery Nathan Fontenot
2015-07-21 4:46 ` [3/6] " Michael Ellerman
2015-07-21 19:14 ` Nathan Fontenot
2015-07-22 1:14 ` Michael Ellerman
2015-06-22 21:00 ` [PATCH 4/6] pseries: Add CPU dlpar remove functionality Nathan Fontenot
2015-07-21 9:27 ` [4/6] " Michael Ellerman
2015-07-21 21:34 ` Nathan Fontenot [this message]
2015-07-22 1:11 ` Michael Ellerman
2015-07-22 15:42 ` Nathan Fontenot
2015-07-21 9:46 ` Michael Ellerman
2015-06-22 21:01 ` [PATCH 5/6] pseries: Add CPU dlpar add functionality Nathan Fontenot
2015-06-22 21:02 ` [PATCH 6/6] pseries: Enable kernel CPU dlpar from sysfs Nathan Fontenot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55AEBAD3.6080601@linux.vnet.ibm.com \
--to=nfont@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).