All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>,
	Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: Scheduler regression in 4.7
Date: Thu, 11 Aug 2016 16:28:12 +0200	[thread overview]
Message-ID: <1470925692.6250.20.camel@citrix.com> (raw)
In-Reply-To: <eeaf7771-2612-fc18-d8fb-7fdbf4cea306@citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 2673 bytes --]

On Thu, 2016-08-11 at 14:39 +0100, Andrew Cooper wrote:
> On 11/08/16 14:24, George Dunlap wrote:
> > On 11/08/16 12:35, Andrew Cooper wrote:
> > > The actual cause is _csched_cpu_pick() falling over LIST_POISON,
> > > which
> > > happened to occur at the same time as a domain was shutting
> > > down.  The
> > > instruction in question is `mov 0x10(%rax),%rax` which looks like
> > > reverse list traversal.
>
Thanks for the report.

> > Could you use line2addr or objdump -dl to get a better idea where
> > the
> > #GP is happening?
> addr2line -e xen-syms-4.7.0-xs127493 ffff82d08012944f
> /obj/RPM_BUILD_DIRECTORY/xen-4.7.0/xen/common/sched_credit.c:775
> (discriminator 1)
> 
> It will be IS_RUNQ_IDLE() which is the problem.
> 
Ok, that does one step of list traversing (the runq). What I didn't
understand from your report is what crashed when.

IS_RUNQ_IDLE() has been introduced a while back and anything like that
has been ever caught so far. George's patch makes _csched_cpu_pick() be
called during insert_vcpu()-->csched_vcpu_insert() which, in 4.7, is
called:
 1) during domain (well, vcpu) creation,
 2) when domain is moved among cpupools

AFAICR, during domain destruction we basically move the domain to
cpupool0, and without a patch that I sent recently, that is always done
as a full fledged cpupool movement, even if the domain is _already_ in
cpupool0. So, even if you are not using cpupools, and since you mention
domain shutdown we probably are looking at 2).

But this is what I'm not sure I got well... Do you have enough info to
tell precisely when the crash manifests? Is it indeed during a domain
shutdown, or was it during a domain creation (sched_init_vcpu() is in
the stack trace... although I've read it's a non-debug one)? And is it
a 'regular' domain or dom0 that is shutting down/coming up?

The idea behind IS_RUNQ_IDLE() is that we need to know whether there is
someone in the runq of a cpu or not, to correctly initialize --and
hence avoid biasing-- some load balancing calculations. I've never
liked the idea (leave it alone the code), but it's necessary (or, at
least, I don't see a sensible alternative).

The questions I'm asking above have the aim of figuring out what the
status of the runq could be, and why adding a call to csched_cpu_pick()
from insert_vcpu() is making things explode...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-08-11 14:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 11:35 Scheduler regression in 4.7 Andrew Cooper
2016-08-11 13:24 ` George Dunlap
2016-08-11 13:39   ` Andrew Cooper
2016-08-11 14:28     ` Dario Faggioli [this message]
2016-08-11 15:42       ` Andrew Cooper
2016-08-12  3:32         ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1470925692.6250.20.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.