All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Dario Faggioli <dfaggioli@suse.com>,
	George Dunlap <george.dunlap@citrix.com>
Cc: "Jan Beulich" <jbeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Jürgen Groß" <jgross@suse.com>
Subject: Re: [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer
Date: Tue, 11 Oct 2022 13:22:15 +0200	[thread overview]
Message-ID: <Y0VR53lPZbbmbR2L@mail-itl> (raw)
In-Reply-To: <e813f296-ba1d-7ff0-a84c-97737d7ac7a5@suse.com>

[-- Attachment #1: Type: text/plain, Size: 11103 bytes --]

On Tue, Sep 20, 2022 at 04:30:41PM +0200, Jan Beulich wrote:
> On 20.09.2022 12:22, Marek Marczykowski-Górecki wrote:
> > On Mon, Aug 22, 2022 at 12:00:27PM +0200, Marek Marczykowski-Górecki wrote:
> >> On Mon, Aug 22, 2022 at 11:53:50AM +0200, Jan Beulich wrote:
> >>> On 21.08.2022 18:14, Marek Marczykowski-Górecki wrote:
> >>>> On Sat, Oct 09, 2021 at 06:28:17PM +0200, Marek Marczykowski-Górecki wrote:
> >>>>> On Sun, Jan 31, 2021 at 03:15:30AM +0100, Marek Marczykowski-Górecki wrote:
> >>>>>> I'm resurrecting this thread as it was recently mentioned elsewhere. I
> >>>>>> can still reproduce the issue on the recent staging branch (9dc687f155).
> >>>>>>
> >>>>>> It fails after the first resume (not always, but frequent enough to
> >>>>>> debug it). At least one guest needs to be running - with just (PV) dom0
> >>>>>> the crash doesn't happen (at least for the ~8 times in a row I tried).
> >>>>>> If the first resume works, the second (almost?) always will fail but
> >>>>>> with a different symptoms - dom0 kernel lockups (at least some of its
> >>>>>> vcpus). I haven't debugged this one yet at all.
> >>>>>>
> >>>>>> Any help will be appreciated, I can apply some debug patches, change
> >>>>>> configuration etc.
> >>>>>
> >>>>> This still happens on 4.14.3. Maybe it is related to freeing percpu
> >>>>> areas, as it caused other issues with suspend too? Just a thought...
> >>>>
> >>>> I have reproduced this on current staging(*). And I can reproduce it
> >>>> reliably. And also, I got (I believe) closely related crash with credit1
> >>>> scheduler.
> >>>>
> >>>> (*) It isn't plain staging, it's one with my xhci console patches on
> >>>> top, including attempt to make it survive S3. I believe the only
> >>>> relevant part there is sticking set_timer() into console resume path (or
> >>>> just having a timer with rather short delay registered). The actual tree
> >>>> at https://github.com/marmarek/xen/tree/master-xue2-debug, including
> >>>> quite a lot of debug prints and debug hacks.
> >>>>
> >>>> Specific crash with credit2:
> >>
> >> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at common/sched/credit2.c:2274
> >> (XEN) ----[ Xen-4.17-unstable  x86_64  debug=y  Tainted:   C    ]----
> >> (XEN) CPU:    10
> >> (XEN) RIP:    e008:[<ffff82d040247a4d>] credit2.c#csched2_unit_wake+0x152/0x154
> >> (XEN) RFLAGS: 0000000000010083   CONTEXT: hypervisor (d0v0)
> >> (XEN) rax: ffff830251778230   rbx: ffff830251768cb0   rcx: 00000032111d6000
> >> (XEN) rdx: ffff8302515c1eb0   rsi: 0000000000000006   rdi: ffff830251769000
> >> (XEN) rbp: ffff8302515cfd90   rsp: ffff8302515cfd70   r8:  ffff830251769000
> >> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> >> (XEN) r12: ffff830251768dd0   r13: ffff8302515c1d00   r14: 0000000000000006
> >> (XEN) r15: ffff82d0405ddb40   cr0: 0000000080050033   cr4: 0000000000372660
> >> (XEN) cr3: 000000022f2a1000   cr2: ffff8881012738e0
> >> (XEN) fsb: 0000744bf6a0db80   gsb: ffff888255600000   gss: 0000000000000000
> >> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> >> (XEN) Xen code around <ffff82d040247a4d> (credit2.c#csched2_unit_wake+0x152/0x154):
> >> (XEN)  df e8 6f bf ff ff eb ad <0f> 0b f3 0f 1e fa 55 48 89 e5 41 57 41 56 41 55
> >> (XEN) Xen stack trace from rsp=ffff8302515cfd70:
> >> (XEN)    ffff83025174b000 ffff830251768cb0 ffff830251778270 ffff82d0405c4298
> >> (XEN)    ffff8302515cfdd8 ffff82d04024fcb8 0000000000000202 ffff830251778270
> >> (XEN)    ffff83025174b000 0000000000000001 ffff830251769018 0000000000000000
> >> (XEN)    0000000000000000 ffff8302515cfe48 ffff82d04020a8c9 ffff8882556aedc0
> >> (XEN)    0000000000000003 00001910537e623e 0000000b988f78a6 0000000059d4a716
> >> (XEN)    00001901f30fa41e 0000000217f96af6 0000000000000000 ffff83025174b000
> >> (XEN)    ffff830251756000 0000000000000002 0000000000000001 ffff8302515cfe70
> >> (XEN)    ffff82d0402f7968 ffff830251756000 ffff8302515cfef8 0000000000000018
> >> (XEN)    ffff8302515cfee8 ffff82d0402ec6de 0000000000000000 ffffffff82f157e0
> >> (XEN)    0000000000000000 0000000000000000 ffff8302515cfef8 0000000000000000
> >> (XEN)    0000000000000000 ffff8302515cffff ffff830251756000 0000000000000000
> >> (XEN)    0000000000000000 0000000000000000 0000000000000000 00007cfdaea300e7
> >> (XEN)    ffff82d0402012bd 0000000000000000 ffffffff82c51120 ffff88810036cf00
> >> (XEN)    0000000000000002 000000000001e120 0000000000000002 0000000000000246
> >> (XEN)    ffffffff82f157e0 0000000000000001 0000000000000000 0000000000000018
> >> (XEN)    ffffffff81e4a30a 0000000000000000 0000000000000002 0000000000000001
> >> (XEN)    0000010000000000 ffffffff81e4a30a 000000000000e033 0000000000000246
> >> (XEN)    ffffc9004aef7c18 000000000000e02b fb5ee398d214b10c eb5ef398c214a10c
> >> (XEN)    eb56f390c21ca104 ebd6f310c29ca184 0000e0100000000a ffff830251756000
> >> (XEN)    0000003211016000 0000000000372660 0000000000000000 80000002963e1002
> >> (XEN) Xen call trace:
> >> (XEN)    [<ffff82d040247a4d>] R credit2.c#csched2_unit_wake+0x152/0x154
> >> (XEN)    [<ffff82d04024fcb8>] F vcpu_wake+0xfd/0x267
> >> (XEN)    [<ffff82d04020a8c9>] F common_vcpu_op+0x178/0x5d1
> >> (XEN)    [<ffff82d0402f7968>] F do_vcpu_op+0x69/0x226
> >> (XEN)    [<ffff82d0402ec6de>] F pv_hypercall+0x575/0x657
> >> (XEN)    [<ffff82d0402012bd>] F lstar_enter+0x13d/0x150
> >> (XEN)
> >> (XEN)
> >> (XEN) ****************************************
> >> (XEN) Panic on CPU 10:
> >> (XEN) Assertion 'c2rqd(sched_unit_master(unit)) == svc->rqd' failed at common/sched/credit2.c:2274
> >> (XEN) ****************************************
> > 
> > Ok, I think I figured it out!
> > 
> > I added a function that verifies run queues of all the sched units, and
> > called it basically every other line on the resume path. The debug
> > function (if anybody is interested):
> > 
> >     void verify_sched_units(void)
> >     {   
> >         struct domain *d;
> >         const struct sched_unit *unit;
> >         
> >         for_each_domain ( d )
> >         {
> >             for_each_sched_unit ( d, unit )
> >             {
> >                 if ( c2rqd(sched_unit_master(unit)) != csched2_unit(unit)->rqd )
> >                 {
> >                     printk(XENLOG_WARNING "d%d sched unit %d: rq=%d, unit master %d, rq=%d\n",
> >                             d->domain_id, unit->unit_id,
> >                             csched2_unit(unit)->rqd ? csched2_unit(unit)->rqd->id : -1,
> >                             sched_unit_master(unit),
> >                             c2rqd(sched_unit_master(unit))->id);
> >                     WARN_ON(1);
> >                 }
> >             }
> >         }
> >     }
> > 
> > It appears that restore_vcpu_affinity() is responsible, specifically
> > this part:
> > 
> > 1216         /*
> > 1217          * Re-assign the initial processor as after resume we have no
> > 1218          * guarantee the old processor has come back to life again.
> > 1219          *
> > 1220          * Therefore, here, before actually unpausing the domains, we should
> > 1221          * set v->processor of each of their vCPUs to something that will
> > 1222          * make sense for the scheduler of the cpupool in which they are in.
> > 1223          */
> > ...
> > 1249         res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
> > 1250         sched_set_res(unit, res);
> > 1251 
> > 1252         spin_unlock_irq(lock);
> > 1253 
> > 1254         /* v->processor might have changed, so reacquire the lock. */
> > 1255         lock = unit_schedule_lock_irq(unit);
> > 1256         res = sched_pick_resource(unit_scheduler(unit), unit);
> > 1257         sched_set_res(unit, res);
> > 1258         spin_unlock_irq(lock);
> > 1259 
> > 1260         if ( old_cpu != sched_unit_master(unit) )
> > 1261             sched_move_irqs(unit);
> > 
> > It calls sched_set_res() directly, which assigns sched resources, but
> > does _not_ adjust runqueues (if new pcpu happen to be assigned to
> > another runqueue than the one from previous pcpu).
> > 
> > I have two (non exclusive) ideas here:
> > 1. If old_cpu is actually still available, do not move it at all.
> > 2. Use sched_migrate() instead of sched_set_res().
> > 
> > Here is the patch that fixes it for me:
> > ---8<---
> > diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> > index 83455fbde1c8..dcf202d8b307 100644
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -1246,19 +1246,29 @@ void restore_vcpu_affinity(struct domain *d)
> >              }
> >          }
> >  
> > -        res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
> > +        /* Prefer old cpu if available. */
> > +        if ( cpumask_test_cpu(old_cpu, cpumask_scratch_cpu(cpu)) )
> > +            res = get_sched_res(old_cpu);
> > +        else
> > +            res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
> >          sched_set_res(unit, res);
> >  
> >          spin_unlock_irq(lock);
> >  
> > -        /* v->processor might have changed, so reacquire the lock. */
> > -        lock = unit_schedule_lock_irq(unit);
> > -        res = sched_pick_resource(unit_scheduler(unit), unit);
> > -        sched_set_res(unit, res);
> > -        spin_unlock_irq(lock);
> > -
> > +        /*
> > +         * If different cpu was chosen, it was random, let scheduler do proper
> > +         * decision.
> > +         */
> >          if ( old_cpu != sched_unit_master(unit) )
> > +        {
> > +            /* v->processor might have changed, so reacquire the lock. */
> > +            lock = unit_schedule_lock_irq(unit);
> > +            res = sched_pick_resource(unit_scheduler(unit), unit);
> > +            sched_migrate(unit_scheduler(unit), unit, res->master_cpu);
> > +            spin_unlock_irq(lock);
> > +
> >              sched_move_irqs(unit);
> > +        }
> >      }
> >  
> >      rcu_read_unlock(&sched_res_rculock);
> > ---8<---
> > 
> > I have several doubts here:
> > 
> > 1. If old_cpu is available, is sched_set_res() needed at all?
> > 2. Should both calls be changed to sched_migrate()? Currently I changed
> >    only the second one, in case scheduler could be confused about
> >    old_cpu not being available anymore.
> > 3. Are there any extra locking requirements for sched_migrate() at this
> >    stage? The long comment above sched_unit_migrate_start() suggests
> >    there might be, but I'm not sure if that's really the case during
> >    resume.
> > 4. Related to the above - should thaw_domains() be modified to call
> >    restore_vcpu_affinity() for all domains first, and unpause only
> >    later? That could reduce locking requirements, I guess.
> 
> All questions primarily to the scheduler maintainers - forwarding
> accordingly.

Ping?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2022-10-11 11:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18 14:16 [Xen-devel] Xen crash after S3 suspend - Xen 4.13 Marek Marczykowski-Górecki
2020-03-18 14:50 ` Andrew Cooper
2020-03-18 22:10   ` Marek Marczykowski-Górecki
2020-03-19  0:28     ` Dario Faggioli
2020-03-19  0:59       ` Marek Marczykowski-Górecki
2020-03-23  0:09       ` Marek Marczykowski-Górecki
2020-03-23  8:14         ` Jan Beulich
2020-09-29 14:27         ` Marek Marczykowski-Górecki
2020-09-29 15:07           ` Jürgen Groß
2020-09-29 15:16             ` Marek Marczykowski-Górecki
2020-09-29 15:27               ` Jürgen Groß
2021-01-31  2:15                 ` [Xen-devel] Xen crash after S3 suspend - Xen 4.13 and newer Marek Marczykowski-Górecki
2021-10-09 16:28                   ` Marek Marczykowski-Górecki
2022-08-21 16:14                     ` Marek Marczykowski-Górecki
2022-08-22  9:53                       ` Jan Beulich
2022-08-22 10:00                         ` Marek Marczykowski-Górecki
2022-09-20 10:22                           ` Marek Marczykowski-Górecki
2022-09-20 14:30                             ` Jan Beulich
2022-10-11 11:22                               ` Marek Marczykowski-Górecki [this message]
2022-10-14 16:42                             ` George Dunlap
2022-10-21  6:41                             ` Juergen Gross
2022-08-22 15:34                       ` Juergen Gross
2022-09-06 11:46                         ` Juergen Gross
2022-09-06 12:35                           ` Marek Marczykowski-Górecki
2022-09-07 12:21                             ` Dario Faggioli
2022-09-07 15:07                               ` marmarek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0VR53lPZbbmbR2L@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.