All of lore.kernel.org
 help / color / mirror / Atom feed
From: "marmarek@invisiblethingslab.com" <marmarek@invisiblethingslab.com>
To: Dario Faggioli <dfaggioli@suse.com>
Cc: Juergen Gross <JGross@suse.com>,
	"frederic.pierret@qubes-os.org" <frederic.pierret@qubes-os.org>,
	"George.Dunlap@citrix.com" <George.Dunlap@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>
Subject: Re: Recent upgrade of 4.13 -> 4.14 issue
Date: Sat, 31 Oct 2020 03:54:42 +0100	[thread overview]
Message-ID: <20201031025442.GF1447@mail-itl> (raw)
In-Reply-To: <b1a18e6ed88db3c40a54c7ca15c3399bdc6f2b9c.camel@suse.com>

[-- Attachment #1: Type: text/plain, Size: 12436 bytes --]

On Sat, Oct 31, 2020 at 02:34:32AM +0000, Dario Faggioli wrote:
> On Tue, 2020-10-27 at 17:06 +0100, Frédéric Pierret wrote:
> > 
> > Ok the server got frozen just few minutes after my mail and I got
> > now:
> > 'r': https://gist.github.com/fepitre/78541f555902275d906d627de2420571
> >
> From the scheduler point of view, things seems fine:
> 
> (XEN) sched_smt_power_savings: disabled
> (XEN) NOW=770188952085
> (XEN) Online Cpus: 0-15
> (XEN) Cpupool 0:
> (XEN) Cpus: 0-15
> (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
> (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
> (XEN) Active queues: 2
> (XEN) 	default-weight     = 256
> (XEN) Runqueue 0:
> (XEN) 	ncpus              = 8
> (XEN) 	cpus               = 0-7
> (XEN) 	max_weight         = 256
> (XEN) 	pick_bias          = 1
> (XEN) 	instload           = 7
> (XEN) 	aveload            = 2021119 (~770%)
> (XEN) 	idlers: 00000000,00000000
> (XEN) 	tickled: 00000000,00000000
> (XEN) 	fully idle cores: 00000000,00000000
> (XEN) Runqueue 1:
> (XEN) 	ncpus              = 8
> (XEN) 	cpus               = 8-15
> (XEN) 	max_weight         = 256
> (XEN) 	pick_bias          = 9
> (XEN) 	instload           = 8
> (XEN) 	aveload            = 2097259 (~800%)
> (XEN) 	idlers: 00000000,00000000
> (XEN) 	tickled: 00000000,00000200
> (XEN) 	fully idle cores: 00000000,00000000
> 
> The system is pretty busy, but not in overload.
> 
> Below we see that CPU 3 is running the idle vCPU, but it's not marked
> as neither idle nor tickled.
> 
> It may be running a tasklet (the one that dumps the debug key output, I
> guess).
> 
> Credits are fine, I don't see any strange values that may indicate
> anomalies or something.
> 
> All the CPUs are executing a vCPU, and there should be nothing that
> prevent them making progresses.
> 
> There is one vCPU which apparetnly want to run at 100% in pretty much
> all guests, and more than one in dom0.
> 
> And I think I saw some spin_lock() in the call stacks, in the partial
> report of '*' debug-key?

Yes, I see:

(XEN) *** Dumping CPU1 host state: ***
(...)
(XEN) Xen call trace:
(XEN)    [<ffff82d040223625>] R _spin_lock+0x35/0x40
(XEN)    [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
(XEN)    [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
(XEN)    [<ffff82d0402df160>] S context_switch+0x110/0xa60
(XEN)    [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
(XEN)    [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)    [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20
(...)
(XEN) *** Dumping CPU2 host state: ***
(XEN) Xen call trace:
(XEN)    [<ffff82d040223622>] R _spin_lock+0x32/0x40
(XEN)    [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
(XEN)    [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
(XEN)    [<ffff82d0402df160>] S context_switch+0x110/0xa60
(XEN)    [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
(XEN)    [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)    [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30

(XEN) *** Dumping CPU5 host state: ***
(XEN) Xen call trace:
(XEN)    [<ffff82d040223622>] R _spin_lock+0x32/0x40
(XEN)    [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
(XEN)    [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
(XEN)    [<ffff82d0402df160>] S context_switch+0x110/0xa60
(XEN)    [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
(XEN)    [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)    [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30

(XEN) *** Dumping CPU6 host state: ***
(XEN) Xen call trace:
(XEN)    [<ffff82d040223625>] R _spin_lock+0x35/0x40
(XEN)    [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
(XEN)    [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
(XEN)    [<ffff82d0402df160>] S context_switch+0x110/0xa60
(XEN)    [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
(XEN)    [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)    [<ffff82d0402d64e6>] S x86_64/entry.S#process_softirqs+0x6/0x20

(XEN) *** Dumping CPU7 host state: ***
(XEN) Xen call trace:
(XEN)    [<ffff82d040223625>] R _spin_lock+0x35/0x40
(XEN)    [<ffff82d0402233cd>] S on_selected_cpus+0x1d/0xc0
(XEN)    [<ffff82d040284aba>] S vmx_do_resume+0xba/0x1b0
(XEN)    [<ffff82d0402df160>] S context_switch+0x110/0xa60
(XEN)    [<ffff82d04024310a>] S core.c#schedule+0x1aa/0x250
(XEN)    [<ffff82d040222d4a>] S softirq.c#__do_softirq+0x5a/0xa0
(XEN)    [<ffff82d040291b6b>] S vmx_asm_do_vmentry+0x2b/0x30

And so on, for (almost?) all CPUs.

Note the '*' output is (I think) from a different instances of the
freeze, so cannot be correlated with other outputs...

> Maybe they're stuck in the kernel, not in Xen? Thoughs ?

Given the above spin locks, I don't think so. But also, even if they are
stuck in the kernel, it clearly happened after 4.13 -> 4.14 upgrade...

> (XEN) Domain info:
> (XEN) 	Domain: 0 w 256 c 0 v 16
> (XEN) 	  1: [0.0] flags=0 cpu=5 credit=10553147 [w=256] load=17122
> (~6%)
> (XEN) 	  2: [0.1] flags=0 cpu=4 credit=10570606 [w=256] load=13569
> (~5%)
> (XEN) 	  3: [0.2] flags=0 cpu=7 credit=10605188 [w=256] load=13465
> (~5%)
> (XEN) 	  4: [0.3] flags=2 cpu=11 credit=9998469 [w=256] load=262144
> (~100%)
> (XEN) 	  5: [0.4] flags=0 cpu=0 credit=10533686 [w=256] load=13619
> (~5%)
> (XEN) 	  6: [0.5] flags=a cpu=9 credit=1101 [w=256] load=0 (~0%)
> (XEN) 	  7: [0.6] flags=2 cpu=2 credit=10621802 [w=256] load=13526
> (~5%)
> (XEN) 	  8: [0.7] flags=2 cpu=1 credit=10670607 [w=256] load=13453
> (~5%)
> (XEN) 	  9: [0.8] flags=2 cpu=7 credit=10649858 [w=256] load=13502
> (~5%)
> (XEN) 	 10: [0.9] flags=0 cpu=3 credit=10550566 [w=256] load=13477
> (~5%)
> (XEN) 	 11: [0.10] flags=2 cpu=4 credit=10644321 [w=256] load=13539
> (~5%)
> (XEN) 	 12: [0.11] flags=2 cpu=1 credit=10602374 [w=256] load=13471
> (~5%)
> (XEN) 	 13: [0.12] flags=0 cpu=6 credit=10617262 [w=256] load=13801
> (~5%)
> (XEN) 	 14: [0.13] flags=2 cpu=8 credit=9998664 [w=256] load=262144
> (~100%)
> (XEN) 	 15: [0.14] flags=0 cpu=3 credit=10603305 [w=256] load=17020
> (~6%)
> (XEN) 	 16: [0.15] flags=0 cpu=5 credit=10591312 [w=256] load=13523
> (~5%)
> (XEN) 	Domain: 1 w 256 c 0 v 2
> (XEN) 	 17: [1.0] flags=2 cpu=6 credit=4916769 [w=256] load=262144
> (~100%)
> (XEN) 	 18: [1.1] flags=0 cpu=13 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 2 w 256 c 0 v 2
> (XEN) 	 19: [2.0] flags=2 cpu=5 credit=4982064 [w=256] load=262144
> (~100%)
> (XEN) 	 20: [2.1] flags=0 cpu=14 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 3 w 256 c 0 v 2
> (XEN) 	 21: [3.0] flags=2 cpu=1 credit=5200781 [w=256] load=262144
> (~100%)
> (XEN) 	 22: [3.1] flags=0 cpu=5 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 4 w 256 c 0 v 2
> (XEN) 	 23: [4.0] flags=12 cpu=0 credit=5395149 [w=256] load=262144
> (~100%)
> (XEN) 	 24: [4.1] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 5 w 256 c 0 v 2
> (XEN) 	 25: [5.0] flags=2 cpu=2 credit=5306461 [w=256] load=262144
> (~100%)
> (XEN) 	 26: [5.1] flags=0 cpu=15 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 6 w 256 c 0 v 8
> (XEN) 	 27: [6.0] flags=12 cpu=10 credit=7915602 [w=256] load=262144
> (~100%)
> (XEN) 	 28: [6.1] flags=0 cpu=10 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 29: [6.2] flags=0 cpu=15 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 30: [6.3] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 31: [6.4] flags=0 cpu=0 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 32: [6.5] flags=0 cpu=6 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 33: [6.6] flags=0 cpu=11 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 34: [6.7] flags=0 cpu=9 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 7 w 256 c 0 v 2
> (XEN) 	 35: [7.0] flags=2 cpu=4 credit=5297013 [w=256] load=262144
> (~100%)
> (XEN) 	 36: [7.1] flags=0 cpu=11 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 8 w 256 c 0 v 4
> (XEN) 	 37: [8.0] flags=0 cpu=14 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 38: [8.1] flags=2 cpu=7 credit=5240630 [w=256] load=262144
> (~100%)
> (XEN) 	 39: [8.2] flags=0 cpu=13 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 40: [8.3] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 9 w 256 c 0 v 2
> (XEN) 	 41: [9.0] flags=0 cpu=0 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 42: [9.1] flags=12 cpu=13 credit=7910266 [w=256] load=262144
> (~100%)
> (XEN) 	Domain: 10 w 256 c 0 v 2
> (XEN) 	 43: [10.0] flags=12 cpu=12 credit=8045458 [w=256] load=262144
> (~100%)
> (XEN) 	 44: [10.1] flags=0 cpu=8 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 11 w 256 c 0 v 2
> (XEN) 	 45: [11.0] flags=12 cpu=14 credit=7575284 [w=256] load=262144
> (~100%)
> (XEN) 	 46: [11.1] flags=0 cpu=12 credit=10500000 [w=256] load=0
> (~0%)
> (XEN) 	Domain: 12 w 256 c 0 v 2
> (XEN) 	 47: [12.0] flags=2 cpu=15 credit=8014099 [w=256] load=262144
> (~100%)
> (XEN) 	 48: [12.1] flags=0 cpu=6 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	Domain: 13 w 256 c 0 v 2
> (XEN) 	 49: [13.0] flags=0 cpu=7 credit=10500000 [w=256] load=0 (~0%)
> (XEN) 	 50: [13.1] flags=0 cpu=15 credit=10500000 [w=256] load=0
> (~0%)
> (XEN) Runqueue 0:
> (XEN) CPU[00] runq=0, sibling={0}, core={0-7}
> (XEN) 	run: [4.0] flags=2 cpu=0 credit=5255200 [w=256] load=262144
> (~100%)
> (XEN) CPU[01] runq=0, sibling={1}, core={0-7}
> (XEN) 	run: [3.0] flags=2 cpu=1 credit=5057668 [w=256] load=262144
> (~100%)
> (XEN) CPU[02] runq=0, sibling={2}, core={0-7}
> (XEN) 	run: [5.0] flags=12 cpu=2 credit=5180785 [w=256] load=262144
> (~100%)
> (XEN) CPU[03] runq=0, sibling={3}, core={0-7}
> (XEN) CPU[04] runq=0, sibling={4}, core={0-7}
> (XEN) 	run: [7.0] flags=2 cpu=4 credit=5215323 [w=256] load=262144
> (~100%)
> (XEN) CPU[05] runq=0, sibling={5}, core={0-7}
> (XEN) 	run: [2.0] flags=2 cpu=5 credit=4816142 [w=256] load=262144
> (~100%)
> (XEN) CPU[06] runq=0, sibling={6}, core={0-7}
> (XEN) 	run: [1.0] flags=2 cpu=6 credit=4755772 [w=256] load=262144
> (~100%)
> (XEN) CPU[07] runq=0, sibling={7}, core={0-7}
> (XEN) 	run: [8.1] flags=12 cpu=7 credit=5175342 [w=256] load=262144
> (~100%)
> (XEN) RUNQ:
> (XEN) Runqueue 1:
> (XEN) CPU[08] runq=1, sibling={8}, core={8-15}
> (XEN) 	run: [0.13] flags=2 cpu=8 credit=9998664 [w=256] load=262144
> (~100%)
> (XEN) CPU[09] runq=1, sibling={9}, core={8-15}
> (XEN) 	run: [0.5] flags=a cpu=9 credit=1101 [w=256] load=0 (~0%)
> (XEN) CPU[10] runq=1, sibling={10}, core={8-15}
> (XEN) 	run: [6.0] flags=2 cpu=10 credit=7764532 [w=256] load=262144
> (~100%)
> (XEN) CPU[11] runq=1, sibling={11}, core={8-15}
> (XEN) 	run: [0.3] flags=2 cpu=11 credit=9998469 [w=256] load=262144
> (~100%)
> (XEN) CPU[12] runq=1, sibling={12}, core={8-15}
> (XEN) 	run: [10.0] flags=2 cpu=12 credit=7967846 [w=256] load=262144
> (~100%)
> (XEN) CPU[13] runq=1, sibling={13}, core={8-15}
> (XEN) 	run: [9.1] flags=12 cpu=13 credit=7832232 [w=256] load=262144
> (~100%)
> (XEN) CPU[14] runq=1, sibling={14}, core={8-15}
> (XEN) 	run: [11.0] flags=2 cpu=14 credit=7509378 [w=256] load=262144
> (~100%)
> (XEN) CPU[15] runq=1, sibling={15}, core={8-15}
> (XEN) 	run: [12.0] flags=2 cpu=15 credit=7971164 [w=256] load=262144
> (~100%)
> (XEN) RUNQ:
> (XEN) CPUs info:
> (XEN) CPU[00] current=d4v0, curr=d4v0, prev=NULL
> (XEN) CPU[01] current=d3v0, curr=d3v0, prev=NULL
> (XEN) CPU[02] current=d5v0, curr=d5v0, prev=NULL
> (XEN) CPU[03] current=d[IDLE]v3, curr=d[IDLE]v3, prev=NULL
> (XEN) CPU[04] current=d7v0, curr=d7v0, prev=NULL
> (XEN) CPU[05] current=d2v0, curr=d2v0, prev=NULL
> (XEN) CPU[06] current=d1v0, curr=d1v0, prev=NULL
> (XEN) CPU[07] current=d8v1, curr=d8v1, prev=NULL
> (XEN) CPU[08] current=d0v13, curr=d0v13, prev=NULL
> (XEN) CPU[09] current=d0v5, curr=d0v5, prev=NULL
> (XEN) CPU[10] current=d6v0, curr=d6v0, prev=NULL
> (XEN) CPU[11] current=d0v3, curr=d0v3, prev=NULL
> (XEN) CPU[12] current=d10v0, curr=d10v0, prev=NULL
> (XEN) CPU[13] current=d9v1, curr=d9v1, prev=NULL
> (XEN) CPU[14] current=d11v0, curr=d11v0, prev=NULL
> (XEN) CPU[15] current=d12v0, curr=d12v0, prev=NULL
> 



-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-10-31  2:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26 13:37 Recent upgrade of 4.13 -> 4.14 issue Frédéric Pierret
2020-10-26 13:54 ` Andrew Cooper
2020-10-26 14:30   ` Jürgen Groß
2020-10-26 16:31     ` Dario Faggioli
2020-10-27  5:58       ` Jürgen Groß
2020-10-27  9:22         ` Dario Faggioli
2020-10-27 15:42           ` Frédéric Pierret
2020-10-27 16:06             ` Frédéric Pierret
2020-10-31  2:34               ` Dario Faggioli
2020-10-31  2:54                 ` marmarek [this message]
2020-10-31  3:27                   ` Dario Faggioli
2020-10-31  4:08                     ` marmarek
2020-10-31 15:04                       ` Frédéric Pierret
2020-11-03 13:23                       ` Frédéric Pierret
2020-11-03 14:15                         ` Dario Faggioli
2020-11-03 14:36                           ` Frédéric Pierret
2020-10-27 16:28           ` Frédéric Pierret
2020-10-27 16:46           ` Frédéric Pierret
2020-10-26 16:11   ` Frédéric Pierret
2020-10-26 17:54     ` Dario Faggioli
2020-10-26 19:10       ` Frédéric Pierret
     [not found] <mailman.2112.1604414193.711.xen-devel@lists.xenproject.org>
2020-12-15 19:08 ` Liwei
2020-12-16  8:12   ` Jan Beulich
2020-12-16 12:19     ` Liwei
2020-12-16 13:10       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201031025442.GF1447@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=George.Dunlap@citrix.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dfaggioli@suse.com \
    --cc=frederic.pierret@qubes-os.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.