From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
To: ego@linux.vnet.ibm.com, Michael Ellerman <mpe@ellerman.id.au>
Cc: tyreld@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] powerpc/pseries: Disable CPU hotplug across migrations
Date: Mon, 24 Sep 2018 09:30:18 -0500 [thread overview]
Message-ID: <b61828ef-b07f-93b1-6185-d4d62bd8fd77@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180924085606.GA16671@in.ibm.com>
On 09/24/2018 03:56 AM, Gautham R Shenoy wrote:
> Hi Michael,
>
> On Mon, Sep 24, 2018 at 05:00:42PM +1000, Michael Ellerman wrote:
>> Nathan Fontenot <nfont@linux.vnet.ibm.com> writes:
>>> On 09/18/2018 05:32 AM, Gautham R Shenoy wrote:
>>>> Hi Nathan,
>>>> On Tue, Sep 18, 2018 at 1:05 AM Nathan Fontenot
>>>> <nfont@linux.vnet.ibm.com> wrote:
>>>>>
>>>>> When performing partition migrations all present CPUs must be online
>>>>> as all present CPUs must make the H_JOIN call as part of the migration
>>>>> process. Once all present CPUs make the H_JOIN call, one CPU is returned
>>>>> to make the rtas call to perform the migration to the destination system.
>>>>>
>>>>> During testing of migration and changing the SMT state we have found
>>>>> instances where CPUs are offlined, as part of the SMT state change,
>>>>> before they make the H_JOIN call. This results in a hung system where
>>>>> every CPU is either in H_JOIN or offline.
>>>>>
>>>>> To prevent this this patch disables CPU hotplug during the migration
>>>>> process.
>>>>>
>>>>> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>>>> ---
>>>>> arch/powerpc/kernel/rtas.c | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>>> index 8afd146bc9c7..2c7ed31c736e 100644
>>>>> --- a/arch/powerpc/kernel/rtas.c
>>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>>> @@ -981,6 +981,7 @@ int rtas_ibm_suspend_me(u64 handle)
>>>>> goto out;
>>>>> }
>>>>>
>>>>> + cpu_hotplug_disable();
>>>>
>>>> So, some of the onlined CPUs ( via
>>>> rtas_online_cpus_mask(offline_mask);) can go still offline,
>>>> if the userspace issues an offline command, just before we execute
>>>> cpu_hotplug_disable().
>>>>
>>>> So we are narrowing down the race, but it still exists. Am I missing something ?
>>>
>>> You're correct, this narrows the window in which a CPU can go offline.
>>>
>>> In testing with this patch we have not been able to re-create the failure but
>>> there is still a small window.
>>
>> Well let's close it.
>>
>> We just need to check that all present CPUs are online after we've
>> called cpu_hotplug_disable() don't we?
>
> Yes. However, we cannot use the cpu_up() API to bring the offline CPUs
> online, since will return with an -EBUSY if CPU-Hotplug has been
> disabled. _cpu_up() works, but it is (understandably) a static
> function in kernel/cpu.c
>
> So, we might need a new APIs along the lines of
> disable_nonboot_cpus()/enable_nonboot_cpus()
> that is currently being used by the suspend subsystem, only that we
> would need the APIs to
> - Disable hotplug and online all the CPUs in an atomic
> fashion. Would be good if the API returns the cpumask of CPUs
> which were offline, which were brought online by this API.
>
> - Restore the state of the machine by offlining the CPUs which
> we brought online, and enable hotplug again.
>
There is already code in the LPM path that saves a cpu mask of the offline
cpus prior to bringing them all online so we can offline them again after
the migration.
The missing piece to fully close the window is an API that will allow us to
online cpus while cpu hotplug is disabled.
Since we have not been able to re-create the failure with this patch would
it be ok to pull in this patch while other options are explored?
-Nathan
>>
>> cheers
>>
next prev parent reply other threads:[~2018-09-24 14:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-17 19:14 [PATCH] powerpc/pseries: Disable CPU hotplug across migrations Nathan Fontenot
2018-09-17 20:41 ` Tyrel Datwyler
2018-09-18 10:32 ` Gautham R Shenoy
2018-09-20 15:03 ` Nathan Fontenot
2018-09-24 7:00 ` Michael Ellerman
2018-09-24 8:56 ` Gautham R Shenoy
2018-09-24 14:30 ` Nathan Fontenot [this message]
2018-09-24 20:49 ` Tyrel Datwyler
2018-09-25 0:38 ` Michael Ellerman
2018-09-25 0:42 ` Michael Ellerman
2018-09-25 6:19 ` Gautham R Shenoy
2018-09-20 4:21 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b61828ef-b07f-93b1-6185-d4d62bd8fd77@linux.vnet.ibm.com \
--to=nfont@linux.vnet.ibm.com \
--cc=ego@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=tyreld@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).