[PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
@ 2015-11-13 10:08 Dario Faggioli
  2015-11-13 10:41 ` Jan Beulich
  2015-11-13 15:19 ` Juergen Gross
  0 siblings, 2 replies; 7+ messages in thread
From: Dario Faggioli @ 2015-11-13 10:08 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross

In fact, with 2 cpupools, one (the default) Credit and
one Credit2 (with at least 1 pCPU in the latter), trying
a (e.g., ACPI S3) suspend/resume crashes like this:

(XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
(XEN) [  150.587783] CPU:    6
(XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
(XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
(XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
(XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
(XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
(XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
(XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
... ... ...
(XEN) [  150.587962] Xen call trace:
(XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
(XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
(XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
(XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
(XEN) [  151.272182]
(XEN) [  151.274174] ****************************************
(XEN) [  151.279624] Panic on CPU 6:
(XEN) [  151.282915] Xen BUG at sched_credit.c:655
(XEN) [  151.287415] ****************************************

During suspend, the pCPUs are not removed from their
pools with the standard procedure (which would involve
schedule_cpu_switch(). During resume, they:
 1) are assigned to the default cpupool (CPU_UP_PREPARE
    phase);
 2) are moved to the pool they were in before suspend,
    via schedule_cpu_switch() (CPU_ONLINE phase)

During resume, scheduling (even if just the idle loop)
can happen right after the CPU_STARTING phase(before
CPU_ONLINE), i.e., before the pCPU is put back in its
pool. In this case, it is the default pool'sscheduler
that is invoked (Credit1, in the example above). But,
during suspend, the Credit2 specific vCPU data is not
being freed, and Credit1 specific vCPU data is not
allocated, during resume.

Therefore, Credit1 schedules on pCPUs whose idle vCPU's
sched_priv points to Credit2 vCPU data, and we crash.

Fix things by properly deallocating scheduler specific
data of the pCPU's pool scheduler during pCPU teardown,
and re-allocating them --always for &ops-- during pCPU
bringup.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 20f5f56..55fc32a 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1381,6 +1381,25 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
+    if ( idle_vcpu[cpu]->sched_priv == NULL )
+    {
+        struct vcpu *idle = idle_vcpu[cpu];
+
+        /*
+         * During (ACPI?) suspend the scheduler specific data (what's in
+         * sched_priv) of the idle vCPU for this cpu's cpupool's scheduler
+         * is freed. At this stage of the resuming path, we treat the pCPU
+         * as it belongs to the default pool. Hence, let's allocate the
+         * default scheduler's data of the idle vCPU of this pCPU. It is
+         * schedule_cpu_switch() (invoked later,  if necessary) that sets
+         * things up as they were before suspend.
+         */
+        idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
+                                    idle->domain->sched_priv);
+        if ( idle->sched_priv == NULL )
+            return -ENOMEM;
+    }
+
     if ( (ops.alloc_pdata != NULL) &&
          ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
         return -ENOMEM;
@@ -1393,9 +1412,13 @@ static void cpu_schedule_down(unsigned int cpu)
     struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
+    SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
     if ( sd->sched_priv != NULL )
         SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
 
+    idle_vcpu[cpu]->sched_priv = NULL;
+    sd->sched_priv = NULL;
+
     kill_timer(&sd->s_timer);
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 10:08 [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
@ 2015-11-13 10:41 ` Jan Beulich
  2015-11-13 12:30   ` Dario Faggioli
  2015-11-13 15:19 ` Juergen Gross
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2015-11-13 10:41 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 13.11.15 at 11:08, <dario.faggioli@citrix.com> wrote:
> Fix things by properly deallocating scheduler specific
> data of the pCPU's pool scheduler during pCPU teardown,
> and re-allocating them --always for &ops-- during pCPU
> bringup.

Avoiding this was done for a reason iirc: What if one such allocation
fails (e.g. due to heap fragmentation resulting from the allocation
order not being the exact inverse of the freeing order)?

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 10:41 ` Jan Beulich
@ 2015-11-13 12:30   ` Dario Faggioli
  2015-11-13 12:52     ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Dario Faggioli @ 2015-11-13 12:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross

[-- Attachment #1.1: Type: text/plain, Size: 4426 bytes --]

On Fri, 2015-11-13 at 03:41 -0700, Jan Beulich wrote:
> > > > On 13.11.15 at 11:08, <dario.faggioli@citrix.com> wrote:
> > Fix things by properly deallocating scheduler specific
> > data of the pCPU's pool scheduler during pCPU teardown,
> > and re-allocating them --always for &ops-- during pCPU
> > bringup.
> 
> Avoiding this was done for a reason iirc: What if one such allocation
> fails (e.g. due to heap fragmentation resulting from the allocation
> order not being the exact inverse of the freeing order)?
> 
I don't think I'm getting the point.

cpu_schedule_up() allocates, with the alloc_pdata hook, stuff that was
freed by cpu_schedule_down (with the free_pdata hook) already.

It actually allocates two things, the other being the entire idle vCPU
of the pCPU, but that is only if idle_vcpu[cpu] is NULL, which should
not be happening during suspend/resume.

In any case, if one of these allocations fails, it already returns
ENOMEM. What happens then is, of course, caller's call. If it happens
during boot for pCPU 0, we BUG(). If it happens with non-boot pCPUs
(either at boot, during hotplug or during suspend/resume), we fail the
bringup of that pCPU at the CPU_UP_PREPARE phase.

This patch introduces one more occasion for the failure to occur, sure,
if there's no (proper) memory. Without this patch the system crashes
already, and even if there is enough (proper) memory, thuogh.

Furthermore, at a later point on the pCPU bringup path,
schedule_cpu_switch() is called. In the scenario described in the
changelog, what it will do is deallocating the existing scheduler
specific per-vCPU data, and re-allocating a new one. If the allocation
fails at that point, what happens is that we fail the bringup at the
CPU_ONLINE stage and (in cpu_up()) we BUG(). This does not really look
better than what happens with the patch applied to me... Quite the
opposite, TBH.

Actually, re-looking at schedule_cpu_switch(), the situation is
probably even worse than how I described it in the changelog! In fact,
still in the scenario introduced there, when we actually get to call
schedule_cpu_switch() (during CPU_ONLINE), the pCPU --let it be pCPU
6-- is in the following state:
 - it's still formally in cpupool1, but it's scheduler is set to
   &ops, i.e., to Credit1,
 - idle_vcpu[6].sched_priv points to Credit2 vCPU data,
 - it is being switched to pool1, with ops == Credit2.

So, schedule_cpu_switch() itself will:
 - allocate a new Credit2 per vCPU data structure,
 - make idle_vcpu[6].sched_priv point such new structure,
 - try to deallocate the still existing Credit2 per vCPU data 
   structure with Credit1's (old_ops in the function) free_vdata hook!!
 - bad things (or so I expect!)

Note, in particular, the the next to last item: we'd be calling
Credit1's csched_free_vdata() to deallocate something that was
allocated with Credit2's csched_alloc_vdata()... Which looks like
another rather interesting bug to me (I haven't been able to trigger
it, most likely because the one shown in the changelog manifests first,
but I can try, if we're curios :-D).

So, in summary, this patch fixes two bugs, one that is showing and one
latent... Not bad! :-)

Fact is, there is an inconsistency between the pCPU's scheduler (set in
cpu_schedule_up(), to Credit1, in the example) and the pCPU's idle
vCPU's private data (Credit2 data in the example), which, if left in
place, could explode somewhere else, at some future point. In my
opinion, the best solution is to remove such inconsistency, and that's
what the patch tries to do.

Another idea could be trying to completely and fully initialize the
_proper_ scheduler for the pCPU in cpu_schedule_up(), but it's not
guaranteed that this will be doable for all schedulers; e.g., it is
impossible for Credit2. Maybe, but I'd have to try, since it's only the
inconsistency in the status of the idle vCPU that triggers the bug, we
can at least set that straight in cpu_schedule_up()... As I said, I'd
have to check, and I'm fine with trying, if it is deemed worthwhile.

Let me know what you think.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 12:30   ` Dario Faggioli
@ 2015-11-13 12:52     ` Jan Beulich
  2015-11-13 14:25       ` Dario Faggioli
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2015-11-13 12:52 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 13.11.15 at 13:30, <dario.faggioli@citrix.com> wrote:
> On Fri, 2015-11-13 at 03:41 -0700, Jan Beulich wrote:
>> > > > On 13.11.15 at 11:08, <dario.faggioli@citrix.com> wrote:
>> > Fix things by properly deallocating scheduler specific
>> > data of the pCPU's pool scheduler during pCPU teardown,
>> > and re-allocating them --always for &ops-- during pCPU
>> > bringup.
>> 
>> Avoiding this was done for a reason iirc: What if one such allocation
>> fails (e.g. due to heap fragmentation resulting from the allocation
>> order not being the exact inverse of the freeing order)?
>> 
> I don't think I'm getting the point.
> 
> cpu_schedule_up() allocates, with the alloc_pdata hook, stuff that was
> freed by cpu_schedule_down (with the free_pdata hook) already.

If there is no change in the allocation pattern (i.e. including the
order of frees and allocs), then I'm sorry for the noise. I understood
the description of the change differently then.

> It actually allocates two things, the other being the entire idle vCPU
> of the pCPU, but that is only if idle_vcpu[cpu] is NULL, which should
> not be happening during suspend/resume.
> 
> In any case, if one of these allocations fails, it already returns
> ENOMEM. What happens then is, of course, caller's call. If it happens
> during boot for pCPU 0, we BUG(). If it happens with non-boot pCPUs
> (either at boot, during hotplug or during suspend/resume), we fail the
> bringup of that pCPU at the CPU_UP_PREPARE phase.

Except that after resume the system should get into the same state
it was before suspend, i.e. in particular same number of CPUs. Imo
there simply shouldn't be any allocations on the resume path -
everything should be possible to be picked up as it was left during
suspend. Along the lines of e.g. the handling of affinities we once
had (and maybe still have), getting saved away and restored instead
of trashed.

> So, in summary, this patch fixes two bugs, one that is showing and one
> latent... Not bad! :-)

Agreed.

> Fact is, there is an inconsistency between the pCPU's scheduler (set in
> cpu_schedule_up(), to Credit1, in the example) and the pCPU's idle
> vCPU's private data (Credit2 data in the example), which, if left in
> place, could explode somewhere else, at some future point. In my
> opinion, the best solution is to remove such inconsistency, and that's
> what the patch tries to do.

Removing the inconsistency is certainly a requirement. The question
just is how to do so without risking system health in other ways.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 12:52     ` Jan Beulich
@ 2015-11-13 14:25       ` Dario Faggioli
  2015-11-13 14:47         ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Dario Faggioli @ 2015-11-13 14:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross

[-- Attachment #1.1: Type: text/plain, Size: 6859 bytes --]

On Fri, 2015-11-13 at 05:52 -0700, Jan Beulich wrote:
>>> On 13.11.15 at 13:30, <dario.faggioli@citrix.com> wrote:

> > cpu_schedule_up() allocates, with the alloc_pdata hook, stuff that
> > was
> > freed by cpu_schedule_down (with the free_pdata hook) already.
> 
> If there is no change in the allocation pattern (i.e. including the
> order of frees and allocs), then I'm sorry for the noise. I
> understood
> the description of the change differently then.
> 
There are changes. So, here it is the situation.

 Right now:                   After this patch:
 ---------                    -----------------
 * Boot / CPU switch pools:   * Boot / CPU switch pools:
    cpu_schedule_up():           cpu_schedule_up():=
     v0 = alloc_vdata()           v0 = alloc_vdata()
     p0 = alloc_pdata()           p0 = alloc_pdata()

 * Suspend:                   * Suspend:
    cpu_schedule_down()          cpu_schedule_down():
     free_pdata(p0)               free_vdata(v0)
                                  free_pdata(p0)     [x]

 * Resume:                    * Resume:
    cpu_schedule_up()            cpu_schedule_up():
     p1 = alloc_pdata()           v1 = alloc_vdata()
                                  p1 = alloc_pdata() [xx]
   [*]
    schedule_cpu_switch()        schedule_cpu_switch():
     p2 = alloc_pdata()           p2 = alloc_pdata()
     v1 = alloc_vdata()           v2 = alloc_vdata()
     free_vdata(v0) [**]          free_vdata(v1)
     free_pdata(p1)               free_pdata(p1)

[*] BUG, the one described in the changelog, if scheduling happens at 
    this point (I can reproduce this with 100% reliability)

[*] LATENT BUG, as it is possible that the free_vdata() function 

called here is not the counterpart of the alloc_vdata() function 

used for allocating v0

So, there are allocations already in the resume path. With this patch,
there is one more free in the suspend path and one more alloc in the
resume path.

I can change the order of operations at [x] and/or [xx] (and send a
v2), e.g., make things look like this, if that helps,:

* Suspend:
   cpu_schedule_down():
    free_pdata(p0)
    free_vdata(v0)

* Resume:
   cpu_schedule_up():
    v1 = alloc_vdata()
    p1 = alloc_pdata()

Would that work?

I don't think I neither can easily get away with the bug, nor eliminate
the explained inconsistency, without that additional free-alloc pair.

> > In any case, if one of these allocations fails, it already returns
> > ENOMEM. What happens then is, of course, caller's call. If it
> > happens
> > during boot for pCPU 0, we BUG(). If it happens with non-boot pCPUs
> > (either at boot, during hotplug or during suspend/resume), we fail
> > the
> > bringup of that pCPU at the CPU_UP_PREPARE phase.
> 
> Except that after resume the system should get into the same state
> it was before suspend, i.e. in particular same number of CPUs.
>
And those same CPUs being in the same pools, which is where this all
springs from.

That being said, I totally agree. But if something unexpected happens
(which is what we are talking about), like being in lack of memory, is
there any value in "limiting the damage"? I think there is, and that is
why I looked at and reported the failure path in details, when
answering your "what if alloc fail" question.

If there is no point in that, why do we handle the error and rollback
the CPU bringup in cpu_up() at all? We could just, at least in case of
resume, BUG_ON() every time we don't get a NOTIFY_DONE result... 

>  Imo
> there simply shouldn't be any allocations on the resume path -
> everything should be possible to be picked up as it was left during
> suspend. 
>
Agreed again. I'm not sure to what extent we can apply this to
scheduling per pCPU and per vCPU data. For example, per vCPU data was
not being deallocated (before my patch, I mean), but per pCPU data was.

I'm not sure why exactly this was the case in the first place, but I
guess it has to do with the fact that we, OTOH, want to actually
deallocate as much stuff as possible in the case of CPU hot unplug, and
parts of this code is shared between the two.

But it is also, IMO, related to the above reasoning on error handling.
I mean, if we just don't kill the per pCPU data the scheduler keeps for
CPU x, and then, during resume, CPU x fails to come up, what happens?
We risk leaking that data or, worse, failing at making the schedule
aware that he does not really have CPU x available any longer.

So, again, if we value "graceful" error compensation, I see some of
these re-allocation necessary. Perhaps, the work I'm doing in another
series of separating scheduler per pCPU data allocation and
initialization can be helpful (not against the risk of leaks, though).

> > Fact is, there is an inconsistency between the pCPU's scheduler
> > (set in
> > cpu_schedule_up(), to Credit1, in the example) and the pCPU's idle
> > vCPU's private data (Credit2 data in the example), which, if left
> > in
> > place, could explode somewhere else, at some future point. In my
> > opinion, the best solution is to remove such inconsistency, and
> > that's
> > what the patch tries to do.
> 
> Removing the inconsistency is certainly a requirement. The question
> just is how to do so without risking system health in other ways.
> 
I had a look. For instance, in Credit2, the scheduler proper
initialization happens only during CPU_STARTING (and needs to stay
there), so I can't attach the pCPU to a Credit2 instance during
CPU_UP_PREPARE, and claim that it's ready for scheduling.

Avoiding deallocating stuff, apart from the dissertation on importance
of error handling, would at least require treating the CPU hotplug and
suspend/resume cases differently, which doesn't look ideal.

Putting something together that would hold up until proper setup
happens (namely, during the time between cpu_schedule_up() and
schedule_cpu_switch()) is also less trivial than I thought. For
instance, I'd need to access per_cpu(cpupool,cpu) from
cpu_schedule_up(), which, from a quick test, it seems I can't. I don't
think it's worth to put much effort in this, considering that the
result would be an hack anyway!

So, all in all, the patch (maybe after reordering the free-s?) remains
the best way forward, IMO. It's possible that, in future, we'' be able
to improve things toward the 'no allocs in resume path' goal, and I'll
keep an eye out for that, when touching related areas.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 14:25       ` Dario Faggioli
@ 2015-11-13 14:47         ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2015-11-13 14:47 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross

>>> On 13.11.15 at 15:25, <dario.faggioli@citrix.com> wrote:
> So, all in all, the patch (maybe after reordering the free-s?) remains
> the best way forward, IMO. It's possible that, in future, we'' be able
> to improve things toward the 'no allocs in resume path' goal, and I'll
> keep an eye out for that, when touching related areas.

Okay, I think you convinced me.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
  2015-11-13 10:08 [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
  2015-11-13 10:41 ` Jan Beulich
@ 2015-11-13 15:19 ` Juergen Gross
  1 sibling, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2015-11-13 15:19 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 13/11/15 11:08, Dario Faggioli wrote:
> In fact, with 2 cpupools, one (the default) Credit and
> one Credit2 (with at least 1 pCPU in the latter), trying
> a (e.g., ACPI S3) suspend/resume crashes like this:
> 
> (XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) [  150.587783] CPU:    6
> (XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
> (XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
> (XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
> (XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
> (XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
> (XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
> (XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
> ... ... ...
> (XEN) [  150.587962] Xen call trace:
> (XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
> (XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
> (XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
> (XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
> (XEN) [  151.272182]
> (XEN) [  151.274174] ****************************************
> (XEN) [  151.279624] Panic on CPU 6:
> (XEN) [  151.282915] Xen BUG at sched_credit.c:655
> (XEN) [  151.287415] ****************************************
> 
> During suspend, the pCPUs are not removed from their
> pools with the standard procedure (which would involve
> schedule_cpu_switch(). During resume, they:
>  1) are assigned to the default cpupool (CPU_UP_PREPARE
>     phase);
>  2) are moved to the pool they were in before suspend,
>     via schedule_cpu_switch() (CPU_ONLINE phase)
> 
> During resume, scheduling (even if just the idle loop)
> can happen right after the CPU_STARTING phase(before
> CPU_ONLINE), i.e., before the pCPU is put back in its
> pool. In this case, it is the default pool'sscheduler
> that is invoked (Credit1, in the example above). But,
> during suspend, the Credit2 specific vCPU data is not
> being freed, and Credit1 specific vCPU data is not
> allocated, during resume.
> 
> Therefore, Credit1 schedules on pCPUs whose idle vCPU's
> sched_priv points to Credit2 vCPU data, and we crash.
> 
> Fix things by properly deallocating scheduler specific
> data of the pCPU's pool scheduler during pCPU teardown,
> and re-allocating them --always for &ops-- during pCPU
> bringup.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-13 15:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-13 10:08 [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
2015-11-13 10:41 ` Jan Beulich
2015-11-13 12:30   ` Dario Faggioli
2015-11-13 12:52     ` Jan Beulich
2015-11-13 14:25       ` Dario Faggioli
2015-11-13 14:47         ` Jan Beulich
2015-11-13 15:19 ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.