All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers.
Date: Mon, 16 Nov 2015 14:10:18 +0100	[thread overview]
Message-ID: <5649D5BA.7080703@suse.com> (raw)
In-Reply-To: <20151113171006.15775.61105.stgit@Solace.station>

On 13/11/15 18:10, Dario Faggioli wrote:
> In fact, with 2 cpupools, one (the default) Credit and
> one Credit2 (with at least 1 pCPU in the latter), trying
> a (e.g., ACPI S3) suspend/resume crashes like this:
> 
> (XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) [  150.587783] CPU:    6
> (XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
> (XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
> (XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
> (XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
> (XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
> (XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
> (XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
> (XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
> ... ... ...
> (XEN) [  150.587962] Xen call trace:
> (XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
> (XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
> (XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
> (XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
> (XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
> (XEN) [  151.272182]
> (XEN) [  151.274174] ****************************************
> (XEN) [  151.279624] Panic on CPU 6:
> (XEN) [  151.282915] Xen BUG at sched_credit.c:655
> (XEN) [  151.287415] ****************************************
> 
> During suspend, the pCPUs are not removed from their
> pools with the standard procedure (which would involve
> schedule_cpu_switch(). During resume, they:
>  1) are assigned to the default cpupool (CPU_UP_PREPARE
>     phase);
>  2) are moved to the pool they were in before suspend,
>     via schedule_cpu_switch() (CPU_ONLINE phase)
> 
> During resume, scheduling (even if just the idle loop)
> can happen right after the CPU_STARTING phase(before
> CPU_ONLINE), i.e., before the pCPU is put back in its
> pool. In this case, it is the default pool'sscheduler
> that is invoked (Credit1, in the example above). But,
> during suspend, the Credit2 specific vCPU data is not
> being freed, and Credit1 specific vCPU data is not
> allocated, during resume.
> 
> Therefore, Credit1 schedules on pCPUs whose idle vCPU's
> sched_priv points to Credit2 vCPU data, and we crash.
> 
> Fix things by properly deallocating scheduler specific
> data of the pCPU's pool scheduler during pCPU teardown,
> and re-allocating them --always for &ops-- during pCPU
> bringup.
> 
> This also fixes another (latent) bug. In fact, it avoids,
> still in schedule_cpu_switch(), that Credit1's free_vdata()
> is used to deallocate data allocated with Credit2's
> alloc_vdata(). This is not easy to trigger, but only
> because the other bug shown above manifests first and
> crashes the host.
> 
> The downside of this patch, is that it adds one more
> allocation on the resume path, which is not ideal. Still,
> there is no better way of fixing the described bugs at
> the moment. Removing (all ideally) allocations happening
> during resume should continue being chased, in the long
> run.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

  reply	other threads:[~2015-11-16 13:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-13 17:10 [PATCH v2] xen: sched: fix (ACPI S3) resume with cpupools with different schedulers Dario Faggioli
2015-11-16 13:10 ` Juergen Gross [this message]
2015-11-24 15:32 ` George Dunlap
2015-11-24 17:14   ` Dario Faggioli
2015-12-07 12:21     ` George Dunlap
2015-12-10  8:42       ` Dario Faggioli
2015-12-10 15:13 ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5649D5BA.7080703@suse.com \
    --to=jgross@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.