From: Mirela Simonovic <mirela.simonovic@aggios.com>
To: Julien Grall <julien.grall@arm.com>, Dario Faggioli <dfaggioli@suse.com>
Cc: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Davorin Mista <dm@aggios.com>,
Xen Devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Date: Thu, 10 May 2018 13:57:25 +0200 [thread overview]
Message-ID: <CAKPH-NhUyxO1vOJnJPFayw71F5SfY87vWchoQybcH1e6Z4Mg2A@mail.gmail.com> (raw)
In-Reply-To: <c713cc00-ace2-f528-563f-175657dccb12@arm.com>
Hi,
+Dario
On Wed, May 9, 2018 at 6:32 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 09/05/18 16:48, Mirela Simonovic wrote:
>>
>> Hi Julien,
>
>
> Hi Mirela,
>
>
>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>
>>> Hi Mirela,
>>>
>>>
>>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>>
>>>>
>>>> On boot, enabling errata workarounds will be triggered by the boot CPU
>>>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>>>> done. This patch adds the code required to enable errata workarounds
>>>> for a CPU being hotplugged after the system boots. This is triggered
>>>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>>>> This is done because it is assumed that the CPU which is hotplugged
>>>> after the system/Xen boots, was initially hotplugged during the
>>>> system/Xen boot. Therefore, enabling errata workarounds should never
>>>> fail.
>>>>
>>>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>>>
>>>> ---
>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>> CC: Julien Grall <julien.grall@arm.com>
>>>> ---
>>>> xen/arch/arm/cpuerrata.c | 35
>>>> +++++++++++++++++++++++++++++++++++
>>>> xen/arch/arm/cpufeature.c | 23 +++++++++++++++++++++++
>>>> xen/include/asm-arm/cpufeature.h | 1 +
>>>> 3 files changed, 59 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>>>> index 1baa20654b..4040f781ec 100644
>>>> --- a/xen/arch/arm/cpuerrata.c
>>>> +++ b/xen/arch/arm/cpuerrata.c
>>>> @@ -5,6 +5,8 @@
>>>> #include <xen/spinlock.h>
>>>> #include <xen/vmap.h>
>>>> #include <xen/warning.h>
>>>> +#include <xen/notifier.h>
>>>> +#include <xen/cpu.h>
>>>> #include <asm/cpufeature.h>
>>>> #include <asm/cpuerrata.h>
>>>> #include <asm/psci.h>
>>>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>>> enable_cpu_capabilities(arm_errata);
>>>> }
>>>> +static int cpu_errata_callback(
>>>> + struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>> +{
>>>> + switch ( action )
>>>> + {
>>>> + case CPU_STARTING:
>>>> + enable_nonboot_cpu_caps(arm_errata);
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>> +
>>>> + return NOTIFY_DONE;
>>>> +}
>>>> +
>>>> +static struct notifier_block cpu_errata_nfb = {
>>>> + .notifier_call = cpu_errata_callback,
>>>> +};
>>>> +
>>>> +static int __init cpu_errata_notifier_init(void)
>>>> +{
>>>> + register_cpu_notifier(&cpu_errata_nfb);
>>>> + return 0;
>>>> +}
>>>> +/*
>>>> + * Initialization has to be done at init rather than presmp_init phase
>>>> because
>>>> + * the callback should execute only after the secondary CPUs are
>>>> initially
>>>> + * booted (in hotplug scenarios when the system state is not boot). On
>>>> boot,
>>>> + * the enabling of errata workarounds will be triggered by the boot CPU
>>>> from
>>>> + * start_xen().
>>>> + */
>>>> +__initcall(cpu_errata_notifier_init);
>>>> +
>>>> /*
>>>> * Local variables:
>>>> * mode: C
>>>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>>>> index 525b45e22f..dd30f0d29c 100644
>>>> --- a/xen/arch/arm/cpufeature.c
>>>> +++ b/xen/arch/arm/cpufeature.c
>>>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>>>> arm_cpu_capabilities *caps)
>>>> }
>>>> }
>>>> +/* Run through the enabled capabilities and enable() them on the
>>>> calling CPU */
>>>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>>>> +{
>>>> + ASSERT(system_state != SYS_STATE_boot);
>>>> +
>>>> + for ( ; caps->matches; caps++ )
>>>> + {
>>>> + if ( !cpus_have_cap(caps->capability) )
>>>> + continue;
>>>> +
>>>> + if ( caps->enable )
>>>> + {
>>>> + /*
>>>> + * Since the CPU has enabled errata workarounds on boot, it
>>>> should
>>>
>>>
>>>
>>> This function is not really about errata, it is about capabilities.
>>> Errata
>>> is just a sub-category of them.
>>>
>>
>> I've fixed the comment, thanks.
>>
>>>> + * never fail to enable them here.
>>>> + */
>>>> + if ( caps->enable((void *)caps) )
>>>> + panic("CPU%u failed to enable capability %u\n",
>>>> + smp_processor_id(), caps->capability);
>>>
>>>
>>>
>>> We should really avoid to use panic(...) if this is something the system
>>> can
>>> survive. In that specific case, it would only affect the current CPU. So
>>> it
>>> would be better to return an error and let the caller decide what to do.
>>>
>>
>> I need to emphasize two points:
>> 1) I don't see how is this different compared to PSCI CPU OFF where we
>> do panic. Essentially, in both cases the system will not be able to
>> use that CPU and we already agreed that is a good reason to panic.
>
>
> You can't compare PSCI CPU off and the enable callback failing. The *only*
> reason PSCI CPU off can fail is because the Trusted OS is resident on that
> CPU. If that ever happen it is a programming error on Xen, and it makes
> sense to fail because you don't want that CPU to spin in Xen.
>
> Enabling a capability can fail because of a failure of allocating memory or
> mapping (see spectre workaround). It is not a programming error but an
> expected behavior and it is not a valid reason to assume we want to kill the
> system.
>
>> As oppose to CPU_OFF which wasn't called on boot so we indeed have no
>> idea whether it will pass on suspend, no matter how unlikely it could
>> fail, in this scenario we are sure that enabling capability should
>> pass because it already passed on boot. So if it doesn't pass, which I
>> consider to be impossible, I believe we should panic.
>> On the other hand, I understand how would this make a difference on
>> big.LITTLE where you try to hotplug a CPU that was never booted.
>> However, that scenario is out of this scope.
>
> While I agree that big.LITTLE is out of scope of your series, what I ask has
> nothing to do with big.LITTLE. There are valid reason for the enable
> callback to fail whether it is the case today or not.
>
>>
>> 2) I still wanted to give a chance to your proposal and just convert
>> panic into stop_cpu+printing error. The system cannot survive if
>> enabling a capability fails. In order to test this I added a
>> capability that will always fail after the boot. This is not realistic
>> in my opinion, but I used it only for testing to check whether the
>> system will survive. Instead of panic I printed an error and stopped
>> the CPU. However, Xen crashed. The boot CPU properly concluded that
>> the erroneous CPU will never become online, but later on credit
>> scheduler's assertion fails.
>
>
> Please provide more details.
>
>> I believe this is something that a person
>> who adds big.LITTLE support should deal with.
>
>
> If there is a bug in the scheduler it should be fixed rather trying to
> workaround with a panic in the code. If you provide more details, we might
> be able to help here.
>
This flow seems to have several bugs. Lets start from here:
Please take a look at function cpu_schedule_callback in schedule.c.
Within switch, case CPU_DEAD doesn't have a break, causing the bellow
CPU_UP_CANCELED to execute as well when the CPU goes down. This looks
wrong to me.
Dario, could you please confirm that this is a bug? Otherwise could
you please confirm the reasoning beyond?
Thanks,
Mirela
>>
>> Do we have an agreement to keep panic?
>
>
> I am afraid not, panic (and BUG*) should only be used when there are no way
> to come back or it is a programming error to end up here. I don't think this
> is the case with the information I have in hand.
>
> The two solutions I find acceptable would be:
> 1) Log a warning and ignore the error. Likely your CPU will break
> later on.
> 2) Return an error and let the caller deal with it. The caller might
> decide to kill the system, but that's not our business. This function should
> only report an error.
>
> Cheers,
>
> --
> Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-05-10 11:57 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2) Mirela Simonovic
2018-04-30 14:32 ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface) Mirela Simonovic
2018-04-30 14:36 ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 04/10] xen/arm: Remove __initdata and __init to enable CPU hotplug Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume Mirela Simonovic
2018-04-30 14:47 ` Julien Grall
2018-05-07 14:55 ` Mirela Simonovic
2018-05-08 14:14 ` Julien Grall
2018-05-08 14:28 ` Mirela Simonovic
2018-05-09 10:32 ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU Mirela Simonovic
2018-05-10 7:33 ` Dario Faggioli
2018-04-27 17:12 ` [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged Mirela Simonovic
2018-04-30 14:51 ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 08/10] xen/arm: Release timer interrupts " Mirela Simonovic
2018-04-30 15:58 ` Julien Grall
2018-05-07 15:33 ` Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug Mirela Simonovic
2018-04-30 15:55 ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot Mirela Simonovic
2018-04-30 16:09 ` Julien Grall
2018-05-09 15:48 ` Mirela Simonovic
2018-05-09 16:32 ` Julien Grall
2018-05-10 11:57 ` Mirela Simonovic [this message]
2018-05-10 13:24 ` Mirela Simonovic
2018-05-10 13:29 ` Julien Grall
2018-05-10 14:12 ` Mirela Simonovic
2018-05-10 14:35 ` Julien Grall
2018-05-10 14:25 ` Dario Faggioli
2018-05-10 15:00 ` Mirela Simonovic
2018-05-10 15:13 ` Julien Grall
2018-05-10 15:49 ` Mirela Simonovic
2018-05-10 16:02 ` Julien Grall
2018-05-10 16:21 ` Dario Faggioli
2018-05-10 16:24 ` Dario Faggioli
2018-05-11 10:41 ` Mirela Simonovic
2018-05-11 10:54 ` Julien Grall
2018-05-11 12:07 ` Mirela Simonovic
2018-05-11 12:20 ` Mirela Simonovic
2018-05-11 13:08 ` Julien Grall
2018-05-11 13:31 ` Dario Faggioli
[not found] ` <CAKPH-Nj2znmdcjZEfFf83WmrcBS_u4R33yPOxAPWw37RHVZ38g@mail.gmail.com>
2018-05-11 14:14 ` Dario Faggioli
2018-05-11 21:47 ` Stefano Stabellini
2018-05-14 9:44 ` Julien Grall
2018-05-14 16:59 ` Stefano Stabellini
2018-05-15 11:45 ` Mirela Simonovic
2018-05-11 13:12 ` Dario Faggioli
2018-05-11 13:01 ` Dario Faggioli
2018-05-10 16:12 ` Dario Faggioli
2018-05-10 13:33 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAKPH-NhUyxO1vOJnJPFayw71F5SfY87vWchoQybcH1e6Z4Mg2A@mail.gmail.com \
--to=mirela.simonovic@aggios.com \
--cc=dfaggioli@suse.com \
--cc=dm@aggios.com \
--cc=edgar.iglesias@xilinx.com \
--cc=julien.grall@arm.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).