Re: [Xen-devel] Xen crash on S3 resume on 4.13 and unstable if any CPU is re-offlined - Marek Marczykowski-Górecki

From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: "Jürgen Groß" <jgross@suse.com>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Michał Kowalczyk" <mkow@invisiblethingslab.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [Xen-devel] Xen crash on S3 resume on 4.13 and unstable if any CPU is re-offlined
Date: Sun, 5 Jan 2020 10:02:06 +0100	[thread overview]
Message-ID: <20200105090206.GG1314@mail-itl> (raw)
In-Reply-To: <fe785b74-5e54-26e6-ffc6-6bc2741b35ee@suse.com>

[-- Attachment #1.1: Type: text/plain, Size: 8893 bytes --]

On Sun, Jan 05, 2020 at 09:25:42AM +0100, Jürgen Groß wrote:
> On 05.01.20 08:39, Marek Marczykowski-Górecki wrote:
> > On Sun, Jan 05, 2020 at 12:42:30AM +0000, Andrew Cooper wrote:
> > > On 04/01/2020 15:30, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > I have a reliable crash on resume from S3. I can reproduce it on both
> > > > real hardware and nested within KVM, although call traces are different
> > > > between those platforms. In any case, it happens only if some CPU is to
> > > > be re-offlined after resume (smt=off and/or maxcpus=... options).
> > > > 
> > > > I think the crash from the real hardware gives more clues, but the one
> > > > from qemu may also be interesting, maybe it's even another bug?
> > > > 
> > > > The crash message (full console log attached):
...
> > > > (XEN) Xen call trace:
> > > > (XEN)    [<ffff82d08023beb7>] R schedule.c#cpu_schedule_callback+0xea/0x1a1
> > > > (XEN)    [<ffff82d080221289>] F notifier_call_chain+0x6b/0x96
> > > > (XEN)    [<ffff82d080203476>] F cpu.c#cpu_notifier_call_chain+0x1b/0x33
> > > > (XEN)    [<ffff82d080203550>] F cpu_down+0x5e/0x15c
> > > > (XEN)    [<ffff82d080203999>] F enable_nonboot_cpus+0x113/0x1fb
> > > > (XEN)    [<ffff82d0802e4240>] F power.c#enter_state_helper+0x107/0x51b
> > > > (XEN)    [<ffff82d08020828f>] F domain.c#continue_hypercall_tasklet_handler+0x8b/0xb7
> > > > (XEN)    [<ffff82d08023fd39>] F tasklet.c#do_tasklet_work+0x76/0xa9
> > > > (XEN)    [<ffff82d08024001a>] F do_tasklet+0x58/0x8a
> > > > (XEN)    [<ffff82d08027247a>] F domain.c#idle_loop+0x40/0x96
...
> > > > And the one from qemu:
> > > > 
> > > > (XEN) mce_intel.c:772: MCA Capability: firstbank 1, extended MCE MSR 0, SER
> > > > (XEN) Finishing wakeup from ACPI S3 state.
> > > > (XEN) Enabling non-boot CPUs  ...
> > > > (XEN) Assertion 'c2rqd(ops, sched_unit_master(unit)) == svc->rqd' failed at sched_credit2.c:2137
> > > > (XEN) ----[ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]----
> > > > (XEN) CPU:    1
> > > > (XEN) RIP:    e008:[<ffff82d08022fe1a>] sched_credit2.c#csched2_unit_wake+0x174/0x176
> > > > (XEN) RFLAGS: 0000000000010097   CONTEXT: hypervisor (d0v0)
> > > > (XEN) rax: ffff83013a7313e8   rbx: ffff83013a6bdf40   rcx: 0000000000000051
> > > > (XEN) rdx: ffff83013a731160   rsi: ffff83013a7310e0   rdi: 0000000000000003
> > > > (XEN) rbp: ffff83013a6f7d98   rsp: ffff83013a6f7d78   r8:  deadbeefdeadf00d
> > > > (XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000000
> > > > (XEN) r12: ffff83013a6bc7e0   r13: ffff82d08043e720   r14: 0000000000000003
> > > > (XEN) r15: 00000003c5ffecac   cr0: 0000000080050033   cr4: 0000000000000660
> > > > (XEN) cr3: 000000004b005000   cr2: 0000000000000000
> > > > (XEN) fsb: 00007751649f4740   gsb: ffff888134a00000   gss: 0000000000000000
> > > > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> > > > (XEN) Xen code around <ffff82d08022fe1a> (sched_credit2.c#csched2_unit_wake+0x174/0x176):
> > > > (XEN)  ef e8 1e c1 ff ff eb a7 <0f> 0b 55 48 89 e5 41 57 41 56 41 55 41 54 53 48
> > > > (XEN) Xen stack trace from rsp=ffff83013a6f7d78:
> > > > (XEN)    ffff83013a6a3000 ffff83013a6bdf40 ffff83013a6bdf40 ffff83013a7313e8
> > > > (XEN)    ffff83013a6f7de8 ffff82d0802391f8 0000000000000202 ffff83013a7313e8
> > > > (XEN)    ffff83013a6c1018 0000000000000001 0000000000000000 0000000000000000
> > > > (XEN)    ffff83013a6c1018 ffff83013a6a3000 ffff83013a6f7e58 ffff82d08020906c
> > > > (XEN)    ffff82d08035d3d4 ffff82d08035d3c8 ffff82d08035d3d4 ffff82d08035d3c8
> > > > (XEN)    ffff82d08035d3d4 ffff82d08035d3c8 ffff82d08035d3d4 ffff83013a6f7ef8
> > > > (XEN)    0000000000000180 ffff83013a6aa000 deadbeefdeadf00d 0000000000000003
> > > > (XEN)    ffff83013a6f7ee8 ffff82d0803570c7 0000000000000001 0000000000000001
> > > > (XEN)    0000000000000000 deadbeefdeadf00d deadbeefdeadf00d ffff82d08035d3c8
> > > > (XEN)    ffff82d08035d3d4 ffff82d08035d3c8 ffff82d08035d3d4 ffff82d08035d3c8
> > > > (XEN)    ffff82d08035d3d4 ffff83013a6aa000 0000000000000000 0000000000000000
> > > > (XEN)    0000000000000000 0000000000000000 00007cfec59080e7 ffff82d08035d432
> > > > (XEN)    0000000000015120 0000000000000001 0000000000000000 ffff88813024a540
> > > > (XEN)    0000000000000000 0000000000000001 0000000000000246 0000000000140000
> > > > (XEN)    ffff8880bf7db000 ffffea0004be4508 0000000000000018 ffffffff8100130a
> > > > (XEN)    0000000000000000 0000000000000001 0000000000000001 0000010000000000
> > > > (XEN)    ffffffff8100130a 000000000000e033 0000000000000246 ffffc90000c97c98
> > > > (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
> > > > (XEN)    0000000000000000 0000e01000000001 ffff83013a6aa000 00000030ba196000
> > > > (XEN)    0000000000000660 0000000000000000 000000013a6e2000 0000040000000000
> > > > (XEN) Xen call trace:
> > > > (XEN)    [<ffff82d08022fe1a>] R sched_credit2.c#csched2_unit_wake+0x174/0x176
> > > > (XEN)    [<ffff82d0802391f8>] F vcpu_wake+0xea/0x4d8
> > > > (XEN)    [<ffff82d08020906c>] F do_vcpu_op+0x36f/0x687
> > > > (XEN)    [<ffff82d0803570c7>] F pv_hypercall+0x28f/0x57d
> > > > (XEN)    [<ffff82d08035d432>] F lstar_enter+0x112/0x120
> > > > (XEN)
> > > > (XEN)
> > > > (XEN) ****************************************
> > > > (XEN) Panic on CPU 1:
> > > > (XEN) Assertion 'c2rqd(ops, sched_unit_master(unit)) == svc->rqd' failed at sched_credit2.c:2137
> > > > (XEN) ****************************************
> > > 
> > > This looks very much like the core scheduling crash found on specific
> > > machines in S5.  From my analysis, it was a use-after-free on a
> > > schedulling resource.
> > > 
> > > Does switching back to thread mode (as opposed to core mode) make the
> > > crash go away?
> > 
> > It is the thread mode (unless default has changed).
> 
> Does the attached patch fix it for you?

Yes, it helps with the issue on the real hardware, thanks! On qemu it helps only
partially - I don't get the crash with "qemu ... -smp 4 -append maxcpus=1"
anymore, but still get it with just "qemu ... -smp 4". It looks like a
different issue.

> From f53e105a9789b6d268e7fe4d05e4b989b9143338 Mon Sep 17 00:00:00 2001
> From: Juergen Gross <jgross@suse.com>
> To: xen-devel@lists.xenproject.org
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Dario Faggioli <dfaggioli@suse.com>
> Date: Sun, 5 Jan 2020 09:21:41 +0100
> Subject: [PATCH] xen/sched: fix resuming from S3 with smt=0
> 
> When resuming from S3 and smt=0 or maxcpus= are specified we must not
> do anything in cpu_schedule_callback(). This is not true today for
> taking down a cpu during resume.
> 
> If anything goes wrong during resume all the scheduler related error
> handling is in cpupool.c, so we can just bail out early from
> cpu_schedule_callback() when suspending or resuming.
> 
> This fixes commit 0763cd2687897b55e7 ("xen/sched: don't disable
> scheduler on cpus during suspend").
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

> ---
>  xen/common/schedule.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e70cc70a65..54a07ff9e8 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -2562,6 +2562,13 @@ static int cpu_schedule_callback(
>      unsigned int cpu = (unsigned long)hcpu;
>      int rc = 0;
>  
> +    /*
> +     * All scheduler related suspend/resume handling needed is done in
> +     * cpupool.c.
> +     */
> +    if ( system_state > SYS_STATE_active )
> +        return NOTIFY_DONE;
> +
>      rcu_read_lock(&sched_res_rculock);
>  
>      /*
> @@ -2589,8 +2596,7 @@ static int cpu_schedule_callback(
>      switch ( action )
>      {
>      case CPU_UP_PREPARE:
> -        if ( system_state != SYS_STATE_resume )
> -            rc = cpu_schedule_up(cpu);
> +        rc = cpu_schedule_up(cpu);
>          break;
>      case CPU_DOWN_PREPARE:
>          rcu_read_lock(&domlist_read_lock);
> @@ -2598,13 +2604,10 @@ static int cpu_schedule_callback(
>          rcu_read_unlock(&domlist_read_lock);
>          break;
>      case CPU_DEAD:
> -        if ( system_state == SYS_STATE_suspend )
> -            break;
>          sched_rm_cpu(cpu);
>          break;
>      case CPU_UP_CANCELED:
> -        if ( system_state != SYS_STATE_resume )
> -            cpu_schedule_down(cpu);
> +        cpu_schedule_down(cpu);
>          break;
>      default:
>          break;

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel