From: George Dunlap <george.dunlap@citrix.com>
To: Julien Grall <julien.grall@arm.com>
Cc: Varun.Swara@arm.com, Dario Faggioli <dario.faggioli@citrix.com>,
Steve Capper <Steve.Capper@arm.com>,
Wei Liu <Wei.Liu2@citrix.com>,
Xen Devel <xen-devel@lists.xen.org>
Subject: Re: xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
Date: Tue, 3 May 2016 14:20:05 +0100 [thread overview]
Message-ID: <CAFLBxZajL5TR42TiFQ2Urw3YO6nfxnQihVkpbs7NQoiaaY9sGA@mail.gmail.com> (raw)
In-Reply-To: <5728A191.4080602@arm.com>
On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@arm.com> wrote:
> Hi Dario and George,
>
> What is the status of this patch? It would be nice to have it for Xen 4.7 to
> avoid unwanted crash when secondary CPUs fails to come online.
Wei, can you put this on your release blockers list? (Julien, I take
it this should be a blocker, right?)
-George
>
> Regards,
>
> On 26/04/16 18:49, Dario Faggioli wrote:
>>
>> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>>>
>>> Hi Dario,
>>>
>> Hi,
>>
>>> A couple of people have been reported Xen crash on the ARM64
>>> Foundation Model [1] with recent unstable.
>>>
>> Ok, thanks for reporting.
>>
>>> The crash seems to happen when Xen fails to bring up secondary CPUs
>>> (see stack trace below).
>>>
>> Ah... I see.
>>
>>> From my understanding, csched_free_pdata is trying to kill the
>>> timer spc->ticker. However the status of this timer is
>>> TIMER_STATUS_invalid.
>>>
>>> This is because csched_init_pdata has set a deadline for the
>>> timer (set_timer) and the softirq to schedule the timer has
>>> not yet happen (indeed Xen is still in early boot).
>>>
>> Yes, this is sort of what happens (only slightly different, see the
>> changelog of the atached patch patch).
>>
>>
>>> I am not sure how to fix this issue. How will you recommend
>>> to fix it?
>>>
>> Yeah, well, doing it cleanly includes a slight change in the scheduler
>> hooks API, IMO... and we indeed should do it cleanly :-))
>>
>> George, what do you think?
>>
>> Honestly, this is similar to what I was thinking to do already (I mean,
>> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
>> working on that series, because I do think it's cleaner... then, I
>> abandoned the idea, as it looked to not be necessary... But apparently
>> it may actually be! :-)
>>
>> Let me know, and I'll resubmit the patch properly (together with
>> another bugfix I have in my queue).
>>
>> Dario
>> ---
>> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
>> Author: Dario Faggioli <dario.faggioli@citrix.com>
>> Date: Tue Apr 26 17:42:36 2016 +0200
>>
>> xen: sched: fix killing an uninitialized timer in free_pdata.
>>
>> commit 64269d9365 "sched: implement .init_pdata in Credit,
>> Credit2 and RTDS" helped fixing Credit2 runqueues, and
>> the races we had in switching scheduler for pCPUs, but
>> introduced another issue. In fact, if CPU bringup fails
>> during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>> but before CPU_STARTING) the CPU_UP_CANCELED notifier
>> would be executed, which calls the free_pdata hook.
>>
>> Such hook does two things: (1) undo the initialization
>> done inside the init_pdata hook; (2) free the memory
>> allocated by the alloc_pdata hook.
>>
>> However, in the failure path just described, it is possible
>> that only alloc_pdata has really been called, which is
>> potentially and issue (depending on what actually happens
>> inside the implementation of free_pdata).
>>
>> In fact, for Credit1 (the only scheduler that actually
>> allocates per-pCPU data), this result in calling kill_timer()
>> on a timer that had not yet been initialized, which causes
>> the following:
>>
>> (XEN) Xen call trace:
>> (XEN) [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>> (XEN) [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>> (XEN) [<00000000002208c0>]
>> sched_credit.c#csched_free_pdata+0xd8/0x114
>> (XEN) [<0000000000227a18>]
>> schedule.c#cpu_schedule_callback+0xc0/0x12c
>> (XEN) [<0000000000219944>] notifier_call_chain+0x78/0x9c
>> (XEN) [<00000000002015fc>] cpu_up+0x104/0x130
>> (XEN) [<000000000028f7c0>] start_xen+0xaf8/0xce0
>> (XEN) [<00000000810021d8>] 00000000810021d8
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
>> timer.c:279
>> (XEN) ****************************************
>>
>> Solve this by making the scheduler hooks API symmetric again,
>> i.e., by adding an deinit_pdata hook and making it responsible
>> of undoing what init_pdata did, rather than asking to free_pdata
>> to do everything.
>>
>> This is cleaner and, in the case at hand, makes it possible to
>> only call free_pdata, which is the right thing to do, as only
>> allocation and no initialization was performed.
>>
>> Reported-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: George Dunlap <george.dunlap@citrix.com>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: Varun.Swara@arm.com
>> Cc: Steve Capper <Steve.Capper@arm.com>
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index bc36837..0a6a1b4 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu
>> *new)
>> }
>>
>> static void
>> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched_free_pdata(const struct scheduler *ops, void *pcpu)
>> {
>> - struct csched_private *prv = CSCHED_PRIV(ops);
>> struct csched_pcpu *spc = pcpu;
>> - unsigned long flags;
>>
>> if ( spc == NULL )
>> return;
>>
>> + xfree(spc);
>> +}
>> +
>> +static void
>> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +{
>> + struct csched_private *prv = CSCHED_PRIV(ops);
>> + struct csched_pcpu *spc = pcpu;
>> + unsigned long flags;
>> +
>> + ASSERT(spc != NULL);
>> +
>> spin_lock_irqsave(&prv->lock, flags);
>>
>> prv->credit -= prv->credits_per_tslice;
>> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void
>> *pcpu, int cpu)
>> kill_timer(&prv->master_ticker);
>>
>> spin_unlock_irqrestore(&prv->lock, flags);
>> -
>> - xfree(spc);
>> }
>>
>> static void *
>> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
>> .free_vdata = csched_free_vdata,
>> .alloc_pdata = csched_alloc_pdata,
>> .init_pdata = csched_init_pdata,
>> + .deinit_pdata = csched_deinit_pdata,
>> .free_pdata = csched_free_pdata,
>> .switch_sched = csched_switch_sched,
>> .alloc_domdata = csched_alloc_domdata,
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 46b9279..f4c37b4 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops,
>> unsigned int cpu,
>> }
>>
>> static void
>> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> {
>> unsigned long flags;
>> struct csched2_private *prv = CSCHED2_PRIV(ops);
>> struct csched2_runqueue_data *rqd;
>> int rqi;
>>
>> + ASSERT(pcpu == NULL);
>> +
>> spin_lock_irqsave(&prv->lock, flags);
>>
>> ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
>> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
>> .alloc_vdata = csched2_alloc_vdata,
>> .free_vdata = csched2_free_vdata,
>> .init_pdata = csched2_init_pdata,
>> - .free_pdata = csched2_free_pdata,
>> + .deinit_pdata = csched2_deinit_pdata,
>> .switch_sched = csched2_switch_sched,
>> .alloc_domdata = csched2_alloc_domdata,
>> .free_domdata = csched2_free_domdata,
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 5546999..1a64521 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
>> struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>> struct scheduler *sched = per_cpu(scheduler, cpu);
>>
>> - SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>> + SCHED_OP(sched, free_pdata, sd->sched_priv);
>> SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>>
>> idle_vcpu[cpu]->sched_priv = NULL;
>> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
>> case CPU_UP_PREPARE:
>> rc = cpu_schedule_up(cpu);
>> break;
>> - case CPU_UP_CANCELED:
>> case CPU_DEAD:
>> + SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>> + /* Fallthrough */
>> + case CPU_UP_CANCELED:
>> cpu_schedule_down(cpu);
>> break;
>> default:
>> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>> vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
>> idle->domain->sched_priv);
>> if ( vpriv == NULL )
>> {
>> - SCHED_OP(new_ops, free_pdata, ppriv, cpu);
>> + SCHED_OP(new_ops, free_pdata, ppriv);
>> return -ENOMEM;
>> }
>>
>> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>> SCHED_OP(new_ops, tick_resume, cpu);
>>
>> SCHED_OP(old_ops, free_vdata, vpriv_old);
>> - SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
>> + SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
>> + SCHED_OP(old_ops, free_pdata, ppriv_old);
>>
>> out:
>> per_cpu(cpupool, cpu) = c;
>> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
>> index 1db7c8d..240f66c 100644
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -135,9 +135,10 @@ struct scheduler {
>> void (*free_vdata) (const struct scheduler *, void *);
>> void * (*alloc_vdata) (const struct scheduler *, struct
>> vcpu *,
>> void *);
>> - void (*free_pdata) (const struct scheduler *, void *,
>> int);
>> + void (*free_pdata) (const struct scheduler *, void *);
>> void * (*alloc_pdata) (const struct scheduler *, int);
>> void (*init_pdata) (const struct scheduler *, void *,
>> int);
>> + void (*deinit_pdata) (const struct scheduler *, void *,
>> int);
>> void (*free_domdata) (const struct scheduler *, void *);
>> void * (*alloc_domdata) (const struct scheduler *, struct
>> domain *);
>>
>>
>
> --
> Julien Grall
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-05-03 13:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-26 14:25 xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279 Julien Grall
2016-04-26 17:49 ` Dario Faggioli
2016-04-26 18:05 ` Julien Grall
2016-04-27 13:43 ` George Dunlap
2016-04-27 14:05 ` Dario Faggioli
2016-04-27 14:29 ` George Dunlap
2016-05-03 13:03 ` Julien Grall
2016-05-03 13:20 ` George Dunlap [this message]
2016-05-03 13:22 ` Julien Grall
2016-05-03 13:23 ` Wei Liu
2016-05-03 21:52 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFLBxZajL5TR42TiFQ2Urw3YO6nfxnQihVkpbs7NQoiaaY9sGA@mail.gmail.com \
--to=george.dunlap@citrix.com \
--cc=Steve.Capper@arm.com \
--cc=Varun.Swara@arm.com \
--cc=Wei.Liu2@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=julien.grall@arm.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).