xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Mirela Simonovic <mirela.simonovic@aggios.com>
To: Julien Grall <julien.grall@arm.com>, Dario Faggioli <dfaggioli@suse.com>
Cc: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Davorin Mista <dm@aggios.com>,
	Xen Devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot
Date: Thu, 10 May 2018 13:57:25 +0200	[thread overview]
Message-ID: <CAKPH-NhUyxO1vOJnJPFayw71F5SfY87vWchoQybcH1e6Z4Mg2A@mail.gmail.com> (raw)
In-Reply-To: <c713cc00-ace2-f528-563f-175657dccb12@arm.com>

Hi,

+Dario

On Wed, May 9, 2018 at 6:32 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 09/05/18 16:48, Mirela Simonovic wrote:
>>
>> Hi Julien,
>
>
> Hi Mirela,
>
>
>> On Mon, Apr 30, 2018 at 6:09 PM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>
>>> Hi Mirela,
>>>
>>>
>>> On 27/04/18 18:12, Mirela Simonovic wrote:
>>>>
>>>>
>>>> On boot, enabling errata workarounds will be triggered by the boot CPU
>>>> from start_xen(). On CPU hotplug (non-boot scenario) this would not be
>>>> done. This patch adds the code required to enable errata workarounds
>>>> for a CPU being hotplugged after the system boots. This is triggered
>>>> using a notifier. If the CPU fails to enable the errata Xen will panic.
>>>> This is done because it is assumed that the CPU which is hotplugged
>>>> after the system/Xen boots, was initially hotplugged during the
>>>> system/Xen boot. Therefore, enabling errata workarounds should never
>>>> fail.
>>>>
>>>> Signed-off-by: Mirela Simonovic <mirela.simonovic@aggios.com>
>>>>
>>>> ---
>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>> CC: Julien Grall <julien.grall@arm.com>
>>>> ---
>>>>    xen/arch/arm/cpuerrata.c         | 35
>>>> +++++++++++++++++++++++++++++++++++
>>>>    xen/arch/arm/cpufeature.c        | 23 +++++++++++++++++++++++
>>>>    xen/include/asm-arm/cpufeature.h |  1 +
>>>>    3 files changed, 59 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
>>>> index 1baa20654b..4040f781ec 100644
>>>> --- a/xen/arch/arm/cpuerrata.c
>>>> +++ b/xen/arch/arm/cpuerrata.c
>>>> @@ -5,6 +5,8 @@
>>>>    #include <xen/spinlock.h>
>>>>    #include <xen/vmap.h>
>>>>    #include <xen/warning.h>
>>>> +#include <xen/notifier.h>
>>>> +#include <xen/cpu.h>
>>>>    #include <asm/cpufeature.h>
>>>>    #include <asm/cpuerrata.h>
>>>>    #include <asm/psci.h>
>>>> @@ -349,6 +351,39 @@ void __init enable_errata_workarounds(void)
>>>>        enable_cpu_capabilities(arm_errata);
>>>>    }
>>>>    +static int cpu_errata_callback(
>>>> +    struct notifier_block *nfb, unsigned long action, void *hcpu)
>>>> +{
>>>> +    switch ( action )
>>>> +    {
>>>> +    case CPU_STARTING:
>>>> +        enable_nonboot_cpu_caps(arm_errata);
>>>> +        break;
>>>> +    default:
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    return NOTIFY_DONE;
>>>> +}
>>>> +
>>>> +static struct notifier_block cpu_errata_nfb = {
>>>> +    .notifier_call = cpu_errata_callback,
>>>> +};
>>>> +
>>>> +static int __init cpu_errata_notifier_init(void)
>>>> +{
>>>> +    register_cpu_notifier(&cpu_errata_nfb);
>>>> +    return 0;
>>>> +}
>>>> +/*
>>>> + * Initialization has to be done at init rather than presmp_init phase
>>>> because
>>>> + * the callback should execute only after the secondary CPUs are
>>>> initially
>>>> + * booted (in hotplug scenarios when the system state is not boot). On
>>>> boot,
>>>> + * the enabling of errata workarounds will be triggered by the boot CPU
>>>> from
>>>> + * start_xen().
>>>> + */
>>>> +__initcall(cpu_errata_notifier_init);
>>>> +
>>>>    /*
>>>>     * Local variables:
>>>>     * mode: C
>>>> diff --git a/xen/arch/arm/cpufeature.c b/xen/arch/arm/cpufeature.c
>>>> index 525b45e22f..dd30f0d29c 100644
>>>> --- a/xen/arch/arm/cpufeature.c
>>>> +++ b/xen/arch/arm/cpufeature.c
>>>> @@ -68,6 +68,29 @@ void __init enable_cpu_capabilities(const struct
>>>> arm_cpu_capabilities *caps)
>>>>        }
>>>>    }
>>>>    +/* Run through the enabled capabilities and enable() them on the
>>>> calling CPU */
>>>> +void enable_nonboot_cpu_caps(const struct arm_cpu_capabilities *caps)
>>>> +{
>>>> +    ASSERT(system_state != SYS_STATE_boot);
>>>> +
>>>> +    for ( ; caps->matches; caps++ )
>>>> +    {
>>>> +        if ( !cpus_have_cap(caps->capability) )
>>>> +            continue;
>>>> +
>>>> +        if ( caps->enable )
>>>> +        {
>>>> +            /*
>>>> +             * Since the CPU has enabled errata workarounds on boot, it
>>>> should
>>>
>>>
>>>
>>> This function is not really about errata, it is about capabilities.
>>> Errata
>>> is just a sub-category of them.
>>>
>>
>> I've fixed the comment, thanks.
>>
>>>> +             * never fail to enable them here.
>>>> +             */
>>>> +            if ( caps->enable((void *)caps) )
>>>> +                panic("CPU%u failed to enable capability %u\n",
>>>> +                      smp_processor_id(), caps->capability);
>>>
>>>
>>>
>>> We should really avoid to use panic(...) if this is something the system
>>> can
>>> survive. In that specific case, it would only affect the current CPU. So
>>> it
>>> would be better to return an error and let the caller decide what to do.
>>>
>>
>> I need to emphasize two points:
>> 1) I don't see how is this different compared to PSCI CPU OFF where we
>> do panic. Essentially, in both cases the system will not be able to
>> use that CPU and we already agreed that is a good reason to panic.
>
>
> You can't compare PSCI CPU off and the enable callback failing. The *only*
> reason PSCI CPU off can fail is because the Trusted OS is resident on that
> CPU. If that ever happen it is a programming error on Xen, and it makes
> sense to fail because you don't want that CPU to spin in Xen.
>
> Enabling a capability can fail because of a failure of allocating memory or
> mapping (see spectre workaround). It is not a programming error but an
> expected behavior and it is not a valid reason to assume we want to kill the
> system.
>
>> As oppose to CPU_OFF which wasn't called on boot so we indeed have no
>> idea whether it will pass on suspend, no matter how unlikely it could
>> fail, in this scenario we are sure that enabling capability should
>> pass because it already passed on boot. So if it doesn't pass, which I
>> consider to be impossible, I believe we should panic.
>> On the other hand, I understand how would this make a difference on
>> big.LITTLE where you try to hotplug a CPU that was never booted.
>> However, that scenario is out of this scope.
>
> While I agree that big.LITTLE is out of scope of your series, what I ask has
> nothing to do with big.LITTLE. There are valid reason for the enable
> callback to fail whether it is the case today or not.
>
>>
>> 2) I still wanted to give a chance to your proposal and just convert
>> panic into stop_cpu+printing error. The system cannot survive if
>> enabling a capability fails. In order to test this I added a
>> capability that will always fail after the boot. This is not realistic
>> in my opinion, but I used it only for testing to check whether the
>> system will survive. Instead of panic I printed an error and stopped
>> the CPU. However, Xen crashed. The boot CPU properly concluded that
>> the erroneous CPU will never become online, but later on credit
>> scheduler's assertion fails.
>
>
> Please provide more details.
>
>> I believe this is something that a person
>> who adds big.LITTLE support should deal with.
>
>
> If there is a bug in the scheduler it should be fixed rather trying to
> workaround with a panic in the code. If you provide more details, we might
> be able to help here.
>

This flow seems to have several bugs. Lets start from here:

Please take a look at function cpu_schedule_callback in schedule.c.
Within switch, case CPU_DEAD doesn't have a break, causing the bellow
CPU_UP_CANCELED to execute as well when the CPU goes down. This looks
wrong to me.
Dario, could you please confirm that this is a bug? Otherwise could
you please confirm the reasoning beyond?

Thanks,
Mirela

>>
>> Do we have an agreement to keep panic?
>
>
> I am afraid not, panic (and BUG*) should only be used when there are no way
> to come back or it is a programming error to end up here. I don't think this
> is the case with the information I have in hand.
>
> The two solutions I find acceptable would be:
>         1) Log a warning and ignore the error. Likely your CPU will break
> later on.
>         2) Return an error and let the caller deal with it. The caller might
> decide to kill the system, but that's not our business. This function should
> only report an error.
>
> Cheers,
>
> --
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-05-10 11:57 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-27 17:12 [PATCH v3 00/10] xen/arm64: Suspend preconditions and CPU hotplug fixes Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 01/10] xen/arm64: Added handling of the trapped access to OSLSR register Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 02/10] xen/arm: Ignore write to GICD_ISACTIVERn registers (vgic-v2) Mirela Simonovic
2018-04-30 14:32   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 03/10] xen/arm: Implement CPU_OFF PSCI call (physical interface) Mirela Simonovic
2018-04-30 14:36   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 04/10] xen/arm: Remove __initdata and __init to enable CPU hotplug Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 05/10] xen/arm: Setup virtual paging for non-boot CPUs on hotplug/resume Mirela Simonovic
2018-04-30 14:47   ` Julien Grall
2018-05-07 14:55     ` Mirela Simonovic
2018-05-08 14:14       ` Julien Grall
2018-05-08 14:28         ` Mirela Simonovic
2018-05-09 10:32           ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 06/10] xen/common: Restore IRQ affinity when hotplugging a pCPU Mirela Simonovic
2018-05-10  7:33   ` Dario Faggioli
2018-04-27 17:12 ` [PATCH v3 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged Mirela Simonovic
2018-04-30 14:51   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 08/10] xen/arm: Release timer interrupts " Mirela Simonovic
2018-04-30 15:58   ` Julien Grall
2018-05-07 15:33     ` Mirela Simonovic
2018-04-27 17:12 ` [PATCH v3 09/10] xen/arm: Free memory allocated for sibling/core maps on CPU hot-unplug Mirela Simonovic
2018-04-30 15:55   ` Julien Grall
2018-04-27 17:12 ` [PATCH v3 10/10] xen/arm: Enable errata for secondary CPU on hotplug after the boot Mirela Simonovic
2018-04-30 16:09   ` Julien Grall
2018-05-09 15:48     ` Mirela Simonovic
2018-05-09 16:32       ` Julien Grall
2018-05-10 11:57         ` Mirela Simonovic [this message]
2018-05-10 13:24           ` Mirela Simonovic
2018-05-10 13:29             ` Julien Grall
2018-05-10 14:12               ` Mirela Simonovic
2018-05-10 14:35                 ` Julien Grall
2018-05-10 14:25             ` Dario Faggioli
2018-05-10 15:00               ` Mirela Simonovic
2018-05-10 15:13                 ` Julien Grall
2018-05-10 15:49                   ` Mirela Simonovic
2018-05-10 16:02                     ` Julien Grall
2018-05-10 16:21                       ` Dario Faggioli
2018-05-10 16:24                     ` Dario Faggioli
2018-05-11 10:41                       ` Mirela Simonovic
2018-05-11 10:54                         ` Julien Grall
2018-05-11 12:07                           ` Mirela Simonovic
2018-05-11 12:20                             ` Mirela Simonovic
2018-05-11 13:08                               ` Julien Grall
2018-05-11 13:31                                 ` Dario Faggioli
     [not found]                                   ` <CAKPH-Nj2znmdcjZEfFf83WmrcBS_u4R33yPOxAPWw37RHVZ38g@mail.gmail.com>
2018-05-11 14:14                                     ` Dario Faggioli
2018-05-11 21:47                                   ` Stefano Stabellini
2018-05-14  9:44                                     ` Julien Grall
2018-05-14 16:59                                       ` Stefano Stabellini
2018-05-15 11:45                                         ` Mirela Simonovic
2018-05-11 13:12                           ` Dario Faggioli
2018-05-11 13:01                         ` Dario Faggioli
2018-05-10 16:12                   ` Dario Faggioli
2018-05-10 13:33           ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKPH-NhUyxO1vOJnJPFayw71F5SfY87vWchoQybcH1e6Z4Mg2A@mail.gmail.com \
    --to=mirela.simonovic@aggios.com \
    --cc=dfaggioli@suse.com \
    --cc=dm@aggios.com \
    --cc=edgar.iglesias@xilinx.com \
    --cc=julien.grall@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).