* [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
@ 2015-11-13 10:08 Dario Faggioli
2015-11-13 10:41 ` Jan Beulich
2015-11-13 15:19 ` Juergen Gross
0 siblings, 2 replies; 7+ messages in thread
From: Dario Faggioli @ 2015-11-13 10:08 UTC (permalink / raw)
To: xen-devel; +Cc: George Dunlap, Juergen Gross
In fact, with 2 cpupools, one (the default) Credit and
one Credit2 (with at least 1 pCPU in the latter), trying
a (e.g., ACPI S3) suspend/resume crashes like this:
(XEN) [ 150.587779] ----[ Xen-4.7-unstable x86_64 debug=y Not tainted ]----
(XEN) [ 150.587783] CPU: 6
(XEN) [ 150.587786] RIP: e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [ 150.587796] RFLAGS: 0000000000010086 CONTEXT: hypervisor
(XEN) [ 150.587801] rax: ffff83031fa3c020 rbx: ffff830322c1b4b0 rcx: 0000000000000000
(XEN) [ 150.587806] rdx: ffff83031fa78000 rsi: 000000000000000a rdi: ffff82d0802a9788
(XEN) [ 150.587811] rbp: ffff83031fa7fe20 rsp: ffff83031fa7fd30 r8: ffff83031fa80000
(XEN) [ 150.587815] r9: 0000000000000006 r10: 000000000008f7f2 r11: 0000000000000006
(XEN) [ 150.587819] r12: ffff8300dbdf3000 r13: ffff830322c1b4b0 r14: 0000000000000006
(XEN) [ 150.587823] r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026e0
(XEN) [ 150.587827] cr3: 00000000dbaa8000 cr2: 0000000000000000
(XEN) [ 150.587830] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) [ 150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
... ... ...
(XEN) [ 150.587962] Xen call trace:
(XEN) [ 150.587966] [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [ 150.587974] [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
(XEN) [ 150.587979] [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
(XEN) [ 150.587983] [<ffff82d08012dc6e>] do_softirq+0x13/0x15
(XEN) [ 150.587988] [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
(XEN) [ 151.272182]
(XEN) [ 151.274174] ****************************************
(XEN) [ 151.279624] Panic on CPU 6:
(XEN) [ 151.282915] Xen BUG at sched_credit.c:655
(XEN) [ 151.287415] ****************************************
During suspend, the pCPUs are not removed from their
pools with the standard procedure (which would involve
schedule_cpu_switch(). During resume, they:
1) are assigned to the default cpupool (CPU_UP_PREPARE
phase);
2) are moved to the pool they were in before suspend,
via schedule_cpu_switch() (CPU_ONLINE phase)
During resume, scheduling (even if just the idle loop)
can happen right after the CPU_STARTING phase(before
CPU_ONLINE), i.e., before the pCPU is put back in its
pool. In this case, it is the default pool'sscheduler
that is invoked (Credit1, in the example above). But,
during suspend, the Credit2 specific vCPU data is not
being freed, and Credit1 specific vCPU data is not
allocated, during resume.
Therefore, Credit1 schedules on pCPUs whose idle vCPU's
sched_priv points to Credit2 vCPU data, and we crash.
Fix things by properly deallocating scheduler specific
data of the pCPU's pool scheduler during pCPU teardown,
and re-allocating them --always for &ops-- during pCPU
bringup.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
xen/common/schedule.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 20f5f56..55fc32a 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1381,6 +1381,25 @@ static int cpu_schedule_up(unsigned int cpu)
if ( idle_vcpu[cpu] == NULL )
return -ENOMEM;
+ if ( idle_vcpu[cpu]->sched_priv == NULL )
+ {
+ struct vcpu *idle = idle_vcpu[cpu];
+
+ /*
+ * During (ACPI?) suspend the scheduler specific data (what's in
+ * sched_priv) of the idle vCPU for this cpu's cpupool's scheduler
+ * is freed. At this stage of the resuming path, we treat the pCPU
+ * as it belongs to the default pool. Hence, let's allocate the
+ * default scheduler's data of the idle vCPU of this pCPU. It is
+ * schedule_cpu_switch() (invoked later, if necessary) that sets
+ * things up as they were before suspend.
+ */
+ idle->sched_priv = SCHED_OP(&ops, alloc_vdata, idle,
+ idle->domain->sched_priv);
+ if ( idle->sched_priv == NULL )
+ return -ENOMEM;
+ }
+
if ( (ops.alloc_pdata != NULL) &&
((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
return -ENOMEM;
@@ -1393,9 +1412,13 @@ static void cpu_schedule_down(unsigned int cpu)
struct schedule_data *sd = &per_cpu(schedule_data, cpu);
struct scheduler *sched = per_cpu(scheduler, cpu);
+ SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
if ( sd->sched_priv != NULL )
SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
+ idle_vcpu[cpu]->sched_priv = NULL;
+ sd->sched_priv = NULL;
+
kill_timer(&sd->s_timer);
}
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. 2015-11-13 10:08 [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli @ 2015-11-13 10:41 ` Jan Beulich 2015-11-13 12:30 ` Dario Faggioli 2015-11-13 15:19 ` Juergen Gross 1 sibling, 1 reply; 7+ messages in thread From: Jan Beulich @ 2015-11-13 10:41 UTC (permalink / raw) To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross >>> On 13.11.15 at 11:08, <dario.faggioli@citrix.com> wrote: > Fix things by properly deallocating scheduler specific > data of the pCPU's pool scheduler during pCPU teardown, > and re-allocating them --always for &ops-- during pCPU > bringup. Avoiding this was done for a reason iirc: What if one such allocation fails (e.g. due to heap fragmentation resulting from the allocation order not being the exact inverse of the freeing order)? Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. 2015-11-13 10:41 ` Jan Beulich @ 2015-11-13 12:30 ` Dario Faggioli 2015-11-13 12:52 ` Jan Beulich 0 siblings, 1 reply; 7+ messages in thread From: Dario Faggioli @ 2015-11-13 12:30 UTC (permalink / raw) To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross [-- Attachment #1.1: Type: text/plain, Size: 4426 bytes --] On Fri, 2015-11-13 at 03:41 -0700, Jan Beulich wrote: > > > > On 13.11.15 at 11:08, <dario.faggioli@citrix.com> wrote: > > Fix things by properly deallocating scheduler specific > > data of the pCPU's pool scheduler during pCPU teardown, > > and re-allocating them --always for &ops-- during pCPU > > bringup. > > Avoiding this was done for a reason iirc: What if one such allocation > fails (e.g. due to heap fragmentation resulting from the allocation > order not being the exact inverse of the freeing order)? > I don't think I'm getting the point. cpu_schedule_up() allocates, with the alloc_pdata hook, stuff that was freed by cpu_schedule_down (with the free_pdata hook) already. It actually allocates two things, the other being the entire idle vCPU of the pCPU, but that is only if idle_vcpu[cpu] is NULL, which should not be happening during suspend/resume. In any case, if one of these allocations fails, it already returns ENOMEM. What happens then is, of course, caller's call. If it happens during boot for pCPU 0, we BUG(). If it happens with non-boot pCPUs (either at boot, during hotplug or during suspend/resume), we fail the bringup of that pCPU at the CPU_UP_PREPARE phase. This patch introduces one more occasion for the failure to occur, sure, if there's no (proper) memory. Without this patch the system crashes already, and even if there is enough (proper) memory, thuogh. Furthermore, at a later point on the pCPU bringup path, schedule_cpu_switch() is called. In the scenario described in the changelog, what it will do is deallocating the existing scheduler specific per-vCPU data, and re-allocating a new one. If the allocation fails at that point, what happens is that we fail the bringup at the CPU_ONLINE stage and (in cpu_up()) we BUG(). This does not really look better than what happens with the patch applied to me... Quite the opposite, TBH. Actually, re-looking at schedule_cpu_switch(), the situation is probably even worse than how I described it in the changelog! In fact, still in the scenario introduced there, when we actually get to call schedule_cpu_switch() (during CPU_ONLINE), the pCPU --let it be pCPU 6-- is in the following state: - it's still formally in cpupool1, but it's scheduler is set to &ops, i.e., to Credit1, - idle_vcpu[6].sched_priv points to Credit2 vCPU data, - it is being switched to pool1, with ops == Credit2. So, schedule_cpu_switch() itself will: - allocate a new Credit2 per vCPU data structure, - make idle_vcpu[6].sched_priv point such new structure, - try to deallocate the still existing Credit2 per vCPU data structure with Credit1's (old_ops in the function) free_vdata hook!! - bad things (or so I expect!) Note, in particular, the the next to last item: we'd be calling Credit1's csched_free_vdata() to deallocate something that was allocated with Credit2's csched_alloc_vdata()... Which looks like another rather interesting bug to me (I haven't been able to trigger it, most likely because the one shown in the changelog manifests first, but I can try, if we're curios :-D). So, in summary, this patch fixes two bugs, one that is showing and one latent... Not bad! :-) Fact is, there is an inconsistency between the pCPU's scheduler (set in cpu_schedule_up(), to Credit1, in the example) and the pCPU's idle vCPU's private data (Credit2 data in the example), which, if left in place, could explode somewhere else, at some future point. In my opinion, the best solution is to remove such inconsistency, and that's what the patch tries to do. Another idea could be trying to completely and fully initialize the _proper_ scheduler for the pCPU in cpu_schedule_up(), but it's not guaranteed that this will be doable for all schedulers; e.g., it is impossible for Credit2. Maybe, but I'd have to try, since it's only the inconsistency in the status of the idle vCPU that triggers the bug, we can at least set that straight in cpu_schedule_up()... As I said, I'd have to check, and I'm fine with trying, if it is deemed worthwhile. Let me know what you think. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. 2015-11-13 12:30 ` Dario Faggioli @ 2015-11-13 12:52 ` Jan Beulich 2015-11-13 14:25 ` Dario Faggioli 0 siblings, 1 reply; 7+ messages in thread From: Jan Beulich @ 2015-11-13 12:52 UTC (permalink / raw) To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross >>> On 13.11.15 at 13:30, <dario.faggioli@citrix.com> wrote: > On Fri, 2015-11-13 at 03:41 -0700, Jan Beulich wrote: >> > > > On 13.11.15 at 11:08, <dario.faggioli@citrix.com> wrote: >> > Fix things by properly deallocating scheduler specific >> > data of the pCPU's pool scheduler during pCPU teardown, >> > and re-allocating them --always for &ops-- during pCPU >> > bringup. >> >> Avoiding this was done for a reason iirc: What if one such allocation >> fails (e.g. due to heap fragmentation resulting from the allocation >> order not being the exact inverse of the freeing order)? >> > I don't think I'm getting the point. > > cpu_schedule_up() allocates, with the alloc_pdata hook, stuff that was > freed by cpu_schedule_down (with the free_pdata hook) already. If there is no change in the allocation pattern (i.e. including the order of frees and allocs), then I'm sorry for the noise. I understood the description of the change differently then. > It actually allocates two things, the other being the entire idle vCPU > of the pCPU, but that is only if idle_vcpu[cpu] is NULL, which should > not be happening during suspend/resume. > > In any case, if one of these allocations fails, it already returns > ENOMEM. What happens then is, of course, caller's call. If it happens > during boot for pCPU 0, we BUG(). If it happens with non-boot pCPUs > (either at boot, during hotplug or during suspend/resume), we fail the > bringup of that pCPU at the CPU_UP_PREPARE phase. Except that after resume the system should get into the same state it was before suspend, i.e. in particular same number of CPUs. Imo there simply shouldn't be any allocations on the resume path - everything should be possible to be picked up as it was left during suspend. Along the lines of e.g. the handling of affinities we once had (and maybe still have), getting saved away and restored instead of trashed. > So, in summary, this patch fixes two bugs, one that is showing and one > latent... Not bad! :-) Agreed. > Fact is, there is an inconsistency between the pCPU's scheduler (set in > cpu_schedule_up(), to Credit1, in the example) and the pCPU's idle > vCPU's private data (Credit2 data in the example), which, if left in > place, could explode somewhere else, at some future point. In my > opinion, the best solution is to remove such inconsistency, and that's > what the patch tries to do. Removing the inconsistency is certainly a requirement. The question just is how to do so without risking system health in other ways. Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. 2015-11-13 12:52 ` Jan Beulich @ 2015-11-13 14:25 ` Dario Faggioli 2015-11-13 14:47 ` Jan Beulich 0 siblings, 1 reply; 7+ messages in thread From: Dario Faggioli @ 2015-11-13 14:25 UTC (permalink / raw) To: Jan Beulich; +Cc: George Dunlap, xen-devel, Juergen Gross [-- Attachment #1.1: Type: text/plain, Size: 6859 bytes --] On Fri, 2015-11-13 at 05:52 -0700, Jan Beulich wrote: >>> On 13.11.15 at 13:30, <dario.faggioli@citrix.com> wrote: > > cpu_schedule_up() allocates, with the alloc_pdata hook, stuff that > > was > > freed by cpu_schedule_down (with the free_pdata hook) already. > > If there is no change in the allocation pattern (i.e. including the > order of frees and allocs), then I'm sorry for the noise. I > understood > the description of the change differently then. > There are changes. So, here it is the situation. Right now: After this patch: --------- ----------------- * Boot / CPU switch pools: * Boot / CPU switch pools: cpu_schedule_up(): cpu_schedule_up():= v0 = alloc_vdata() v0 = alloc_vdata() p0 = alloc_pdata() p0 = alloc_pdata() * Suspend: * Suspend: cpu_schedule_down() cpu_schedule_down(): free_pdata(p0) free_vdata(v0) free_pdata(p0) [x] * Resume: * Resume: cpu_schedule_up() cpu_schedule_up(): p1 = alloc_pdata() v1 = alloc_vdata() p1 = alloc_pdata() [xx] [*] schedule_cpu_switch() schedule_cpu_switch(): p2 = alloc_pdata() p2 = alloc_pdata() v1 = alloc_vdata() v2 = alloc_vdata() free_vdata(v0) [**] free_vdata(v1) free_pdata(p1) free_pdata(p1) [*] BUG, the one described in the changelog, if scheduling happens at this point (I can reproduce this with 100% reliability) [*] LATENT BUG, as it is possible that the free_vdata() function called here is not the counterpart of the alloc_vdata() function used for allocating v0 So, there are allocations already in the resume path. With this patch, there is one more free in the suspend path and one more alloc in the resume path. I can change the order of operations at [x] and/or [xx] (and send a v2), e.g., make things look like this, if that helps,: * Suspend: cpu_schedule_down(): free_pdata(p0) free_vdata(v0) * Resume: cpu_schedule_up(): v1 = alloc_vdata() p1 = alloc_pdata() Would that work? I don't think I neither can easily get away with the bug, nor eliminate the explained inconsistency, without that additional free-alloc pair. > > In any case, if one of these allocations fails, it already returns > > ENOMEM. What happens then is, of course, caller's call. If it > > happens > > during boot for pCPU 0, we BUG(). If it happens with non-boot pCPUs > > (either at boot, during hotplug or during suspend/resume), we fail > > the > > bringup of that pCPU at the CPU_UP_PREPARE phase. > > Except that after resume the system should get into the same state > it was before suspend, i.e. in particular same number of CPUs. > And those same CPUs being in the same pools, which is where this all springs from. That being said, I totally agree. But if something unexpected happens (which is what we are talking about), like being in lack of memory, is there any value in "limiting the damage"? I think there is, and that is why I looked at and reported the failure path in details, when answering your "what if alloc fail" question. If there is no point in that, why do we handle the error and rollback the CPU bringup in cpu_up() at all? We could just, at least in case of resume, BUG_ON() every time we don't get a NOTIFY_DONE result... > Imo > there simply shouldn't be any allocations on the resume path - > everything should be possible to be picked up as it was left during > suspend. > Agreed again. I'm not sure to what extent we can apply this to scheduling per pCPU and per vCPU data. For example, per vCPU data was not being deallocated (before my patch, I mean), but per pCPU data was. I'm not sure why exactly this was the case in the first place, but I guess it has to do with the fact that we, OTOH, want to actually deallocate as much stuff as possible in the case of CPU hot unplug, and parts of this code is shared between the two. But it is also, IMO, related to the above reasoning on error handling. I mean, if we just don't kill the per pCPU data the scheduler keeps for CPU x, and then, during resume, CPU x fails to come up, what happens? We risk leaking that data or, worse, failing at making the schedule aware that he does not really have CPU x available any longer. So, again, if we value "graceful" error compensation, I see some of these re-allocation necessary. Perhaps, the work I'm doing in another series of separating scheduler per pCPU data allocation and initialization can be helpful (not against the risk of leaks, though). > > Fact is, there is an inconsistency between the pCPU's scheduler > > (set in > > cpu_schedule_up(), to Credit1, in the example) and the pCPU's idle > > vCPU's private data (Credit2 data in the example), which, if left > > in > > place, could explode somewhere else, at some future point. In my > > opinion, the best solution is to remove such inconsistency, and > > that's > > what the patch tries to do. > > Removing the inconsistency is certainly a requirement. The question > just is how to do so without risking system health in other ways. > I had a look. For instance, in Credit2, the scheduler proper initialization happens only during CPU_STARTING (and needs to stay there), so I can't attach the pCPU to a Credit2 instance during CPU_UP_PREPARE, and claim that it's ready for scheduling. Avoiding deallocating stuff, apart from the dissertation on importance of error handling, would at least require treating the CPU hotplug and suspend/resume cases differently, which doesn't look ideal. Putting something together that would hold up until proper setup happens (namely, during the time between cpu_schedule_up() and schedule_cpu_switch()) is also less trivial than I thought. For instance, I'd need to access per_cpu(cpupool,cpu) from cpu_schedule_up(), which, from a quick test, it seems I can't. I don't think it's worth to put much effort in this, considering that the result would be an hack anyway! So, all in all, the patch (maybe after reordering the free-s?) remains the best way forward, IMO. It's possible that, in future, we'' be able to improve things toward the 'no allocs in resume path' goal, and I'll keep an eye out for that, when touching related areas. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. 2015-11-13 14:25 ` Dario Faggioli @ 2015-11-13 14:47 ` Jan Beulich 0 siblings, 0 replies; 7+ messages in thread From: Jan Beulich @ 2015-11-13 14:47 UTC (permalink / raw) To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross >>> On 13.11.15 at 15:25, <dario.faggioli@citrix.com> wrote: > So, all in all, the patch (maybe after reordering the free-s?) remains > the best way forward, IMO. It's possible that, in future, we'' be able > to improve things toward the 'no allocs in resume path' goal, and I'll > keep an eye out for that, when touching related areas. Okay, I think you convinced me. Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers. 2015-11-13 10:08 [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli 2015-11-13 10:41 ` Jan Beulich @ 2015-11-13 15:19 ` Juergen Gross 1 sibling, 0 replies; 7+ messages in thread From: Juergen Gross @ 2015-11-13 15:19 UTC (permalink / raw) To: Dario Faggioli, xen-devel; +Cc: George Dunlap On 13/11/15 11:08, Dario Faggioli wrote: > In fact, with 2 cpupools, one (the default) Credit and > one Credit2 (with at least 1 pCPU in the latter), trying > a (e.g., ACPI S3) suspend/resume crashes like this: > > (XEN) [ 150.587779] ----[ Xen-4.7-unstable x86_64 debug=y Not tainted ]---- > (XEN) [ 150.587783] CPU: 6 > (XEN) [ 150.587786] RIP: e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d > (XEN) [ 150.587796] RFLAGS: 0000000000010086 CONTEXT: hypervisor > (XEN) [ 150.587801] rax: ffff83031fa3c020 rbx: ffff830322c1b4b0 rcx: 0000000000000000 > (XEN) [ 150.587806] rdx: ffff83031fa78000 rsi: 000000000000000a rdi: ffff82d0802a9788 > (XEN) [ 150.587811] rbp: ffff83031fa7fe20 rsp: ffff83031fa7fd30 r8: ffff83031fa80000 > (XEN) [ 150.587815] r9: 0000000000000006 r10: 000000000008f7f2 r11: 0000000000000006 > (XEN) [ 150.587819] r12: ffff8300dbdf3000 r13: ffff830322c1b4b0 r14: 0000000000000006 > (XEN) [ 150.587823] r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026e0 > (XEN) [ 150.587827] cr3: 00000000dbaa8000 cr2: 0000000000000000 > (XEN) [ 150.587830] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) [ 150.587835] Xen stack trace from rsp=ffff83031fa7fd30: > ... ... ... > (XEN) [ 150.587962] Xen call trace: > (XEN) [ 150.587966] [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d > (XEN) [ 150.587974] [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635 > (XEN) [ 150.587979] [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d > (XEN) [ 150.587983] [<ffff82d08012dc6e>] do_softirq+0x13/0x15 > (XEN) [ 150.587988] [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b > (XEN) [ 151.272182] > (XEN) [ 151.274174] **************************************** > (XEN) [ 151.279624] Panic on CPU 6: > (XEN) [ 151.282915] Xen BUG at sched_credit.c:655 > (XEN) [ 151.287415] **************************************** > > During suspend, the pCPUs are not removed from their > pools with the standard procedure (which would involve > schedule_cpu_switch(). During resume, they: > 1) are assigned to the default cpupool (CPU_UP_PREPARE > phase); > 2) are moved to the pool they were in before suspend, > via schedule_cpu_switch() (CPU_ONLINE phase) > > During resume, scheduling (even if just the idle loop) > can happen right after the CPU_STARTING phase(before > CPU_ONLINE), i.e., before the pCPU is put back in its > pool. In this case, it is the default pool'sscheduler > that is invoked (Credit1, in the example above). But, > during suspend, the Credit2 specific vCPU data is not > being freed, and Credit1 specific vCPU data is not > allocated, during resume. > > Therefore, Credit1 schedules on pCPUs whose idle vCPU's > sched_priv points to Credit2 vCPU data, and we crash. > > Fix things by properly deallocating scheduler specific > data of the pCPU's pool scheduler during pCPU teardown, > and re-allocating them --always for &ops-- during pCPU > bringup. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Juergen ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-11-13 15:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-13 10:08 [PATCH] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli 2015-11-13 10:41 ` Jan Beulich 2015-11-13 12:30 ` Dario Faggioli 2015-11-13 12:52 ` Jan Beulich 2015-11-13 14:25 ` Dario Faggioli 2015-11-13 14:47 ` Jan Beulich 2015-11-13 15:19 ` Juergen Gross
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.