All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool
Date: Thu, 2 Jul 2015 18:01:58 +0200	[thread overview]
Message-ID: <1435852918.14347.38.camel@citrix.com> (raw)
In-Reply-To: <CAFLBxZbkGg5F+MXYAwGnua3xZBWSXSkTocf7E3_F7U6moxEkYQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2987 bytes --]

On Thu, 2015-07-02 at 16:24 +0100, George Dunlap wrote:
> On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > Ideally, the pCPUs that are 'free', i.e., not assigned
> > to any cpupool, should not be considred by the scheduler
> > for load balancing or anything. In Credit1, we fail at
> > this, because of how we use cpupool_scheduler_cpumask().
> > In fact, for a free pCPU, cpupool_scheduler_cpumask()
> > returns a pointer to cpupool_free_cpus, and hence, near
> > the top of csched_load_balance():
> >
> >  if ( unlikely(!cpumask_test_cpu(cpu, online)) )
> >      goto out;
> >
> > is false (the pCPU _is_ free!), and we therefore do not
> > jump to the end right away, as we should. This, causes
> > the following splat when resuming from ACPI S3 with
> > pCPUs not assigned to any pool:
> 
> What I haven't quite twigged yet with regard to this specific bug is
> why csched_load_balance() is being run on this cpu at all if it hasn't
> been added to the cpupool yet.  
>
Because the cpu is online already. That happens in start_secondary(),
right after having notified the CPU_STARTING phase of CPU bringup.

OTOH, adding a cpu to a pool only happens during the CPU_ONLINE phase,
which is after that.

> AFAICT it's only called from
> csched_schedule() -- why is that being called on a cpu that isn't
> online yet?  
>
Because, sadly, it is online. :-/

> If we can't fix that before the code freeze (or if we
> wouldn't want to avoid it anyway), would it make more sense to check
> for that condition in csched_schedule() and just return the idle
> domain?
>
I tried to move things around (i.e., moving the assignment to a cpupool
ahead in the bringup process), but that introduces irq-safe vs. non
irq-safe locking issues (ISTR that was on the spin locks protecting the
memory being allocated from inside schedule_cpu_switch(), which is
called by the cpu to cpupool assignment routine).

So, yeah, it's something I also don't like much, but it's hard to fix
properly, even without considering 4.6's freeze. :-/

> (Or schedule() even?)
> 
Perhaps, but:
 - this is (I think) pretty orthogonal wrt this patch,
 - I tried something like that as well, but then I did not like having a
   check for "is this cpu outside of any cpupool?" in such code, and
   going through it all the time (and this is an hot path!), for the
   sake of a corner case (as I consider having cpus outside of any
   cpupool a transient situation, as it is system start).

The approach implemented seemed the best one in term of both how the
code looks and performs for the vast majority of use cases and of the
time.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-07-02 16:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25 12:15 [PATCH 0/4] xen: sched / cpupool: fixes and improvements, mostly for when suspend/resume is involved Dario Faggioli
2015-06-25 12:15 ` [PATCH 1/4] xen: sched: avoid dumping duplicate information Dario Faggioli
2015-06-26 13:43   ` Juergen Gross
2015-07-02 11:18   ` George Dunlap
2015-06-25 12:15 ` [PATCH 2/4] xen: x86 / cpupool: clear the proper cpu_valid bit on pCPU teardown Dario Faggioli
2015-06-25 14:20   ` Andrew Cooper
2015-06-25 15:04     ` Dario Faggioli
2015-06-25 15:52       ` Andrew Cooper
2015-06-25 16:13         ` Dario Faggioli
2015-06-25 16:39           ` Andrew Cooper
2015-06-26 13:54   ` Juergen Gross
2015-06-25 12:15 ` [PATCH 3/4] xen: credit1: properly deal with pCPUs not in any cpupool Dario Faggioli
2015-06-26 14:05   ` Juergen Gross
2015-07-02 15:24   ` George Dunlap
2015-07-02 16:01     ` Dario Faggioli [this message]
2015-07-02 16:14       ` George Dunlap
2015-06-25 12:15 ` [PATCH 4/4] xen: sched: get rid of cpupool_scheduler_cpumask() Dario Faggioli
2015-06-26 14:08   ` Juergen Gross
2015-06-26 14:57     ` Joshua Whitehead
2015-06-27 19:21   ` Meng Xu
2015-07-02 15:39   ` George Dunlap
2015-07-03  7:48     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1435852918.14347.38.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.