From: "marmarek@invisiblethingslab.com" <marmarek@invisiblethingslab.com>
To: Dario Faggioli <dfaggioli@suse.com>
Cc: Juergen Gross <JGross@suse.com>,
"frederic.pierret@qubes-os.org" <frederic.pierret@qubes-os.org>,
"George.Dunlap@citrix.com" <George.Dunlap@citrix.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Subject: Re: Recent upgrade of 4.13 -> 4.14 issue
Date: Sat, 31 Oct 2020 05:08:17 +0100 [thread overview]
Message-ID: <20201031040817.GG1447@mail-itl> (raw)
In-Reply-To: <c17e7a152a7e1922bd9c729f70a96acf4ca5240b.camel@suse.com>
[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]
On Sat, Oct 31, 2020 at 04:27:58AM +0100, Dario Faggioli wrote:
> On Sat, 2020-10-31 at 03:54 +0100, marmarek@invisiblethingslab.com
> wrote:
> > On Sat, Oct 31, 2020 at 02:34:32AM +0000, Dario Faggioli wrote:
> > (XEN) *** Dumping CPU7 host state: ***
> > (XEN) Xen call trace:
> > (XEN) [<ffff82d040223625>] R _spin_lock+0x35/0x40
> > (XEN) [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
> > (XEN) [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
> > (XEN) [<ffff82d0402df160>] S context_switch+0x110/0xa60
> > (XEN) [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
> > (XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
> > (XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30
> >
> > And so on, for (almost?) all CPUs.
>
> Right. So, it seems like a live (I would say) lock. It might happen on
> some resource which his shared among domains. And introduced (the
> livelock, not the resource or the sharing) in 4.14.
>
> Just giving a quick look, I see that vmx_do_resume() calls
> vmx_clear_vmcs() which calls on_selected_cpus() which takes the
> call_lock spinlock.
>
> And none of these seems to have received much attention recently.
>
> But this is just a really basic analysis!
I've looked at on_selected_cpus() and my understanding is this:
1. take call_lock spinlock
2. set function+args+what cpus to be called in a global "call_data" variable
3. ask CPUs to execute that function (smp_send_call_function_mask() call)
4. wait for all requested CPUs to execute the function, still holding
the spinlock
5. only then - release the spinlock
So, if any CPU does not execute requested function for any reason, it
will keep the call_lock locked forever.
I don't see any CPU waiting on step 4, but also I don't see call traces
from CPU3 and CPU8 in the log - that's because they are in guest (dom0
here) context, right? I do see "guest state" dumps from them.
The only three CPUs that do logged xen call traces and are not waiting on that
spin lock are:
CPU0:
(XEN) Xen call trace:
(XEN) [<ffff82d040240f89>] R vcpu_unblock+0x9/0x50
(XEN) [<ffff82d0402e0171>] S vcpu_kick+0x11/0x60
(XEN) [<ffff82d0402259c8>] S tasklet.c#do_tasklet_work+0x68/0xc0
(XEN) [<ffff82d040225a59>] S tasklet.c#tasklet_softirq_action+0x39/0x60
(XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN) [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30
CPU4:
(XEN) Xen call trace:
(XEN) [<ffff82d040227043>] R set_timer+0x133/0x220
(XEN) [<ffff82d040234e90>] S credit.c#csched_tick+0/0x3a0
(XEN) [<ffff82d04022660f>] S timer.c#timer_softirq_action+0x9f/0x300
(XEN) [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN) [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20
CPU14:
(XEN) Xen call trace:
(XEN) [<ffff82d040222dc0>] R do_softirq+0/0x10
(XEN) [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20
I'm not sure if any of those is related to that spin lock,
on_selected_cpus() call, or anything like that...
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-10-31 4:08 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-26 13:37 Recent upgrade of 4.13 -> 4.14 issue Frédéric Pierret
2020-10-26 13:54 ` Andrew Cooper
2020-10-26 14:30 ` Jürgen Groß
2020-10-26 16:31 ` Dario Faggioli
2020-10-27 5:58 ` Jürgen Groß
2020-10-27 9:22 ` Dario Faggioli
2020-10-27 15:42 ` Frédéric Pierret
2020-10-27 16:06 ` Frédéric Pierret
2020-10-31 2:34 ` Dario Faggioli
2020-10-31 2:54 ` marmarek
2020-10-31 3:27 ` Dario Faggioli
2020-10-31 4:08 ` marmarek [this message]
2020-10-31 15:04 ` Frédéric Pierret
2020-11-03 13:23 ` Frédéric Pierret
2020-11-03 14:15 ` Dario Faggioli
2020-11-03 14:36 ` Frédéric Pierret
2020-10-27 16:28 ` Frédéric Pierret
2020-10-27 16:46 ` Frédéric Pierret
2020-10-26 16:11 ` Frédéric Pierret
2020-10-26 17:54 ` Dario Faggioli
2020-10-26 19:10 ` Frédéric Pierret
[not found] <mailman.2112.1604414193.711.xen-devel@lists.xenproject.org>
2020-12-15 19:08 ` Liwei
2020-12-16 8:12 ` Jan Beulich
2020-12-16 12:19 ` Liwei
2020-12-16 13:10 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201031040817.GG1447@mail-itl \
--to=marmarek@invisiblethingslab.com \
--cc=George.Dunlap@citrix.com \
--cc=JGross@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=dfaggioli@suse.com \
--cc=frederic.pierret@qubes-os.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.