Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: "Wu, Feng" <feng.wu@intel.com>, Jan Beulich <JBeulich@suse.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"keir@xen.org" <keir@xen.org>,
	"george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
	"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list
Date: Tue, 24 May 2016 16:02:27 +0200	[thread overview]
Message-ID: <1464098547.21930.107.camel@citrix.com> (raw)
In-Reply-To: <E959C4978C3B6342920538CF579893F0196ED481@SHSMSX103.ccr.corp.intel.com>


[-- Attachment #1.1: Type: text/plain, Size: 5746 bytes --]

On Tue, 2016-05-24 at 10:07 +0000, Wu, Feng wrote:
> > See, for instance, cpu_disable_scheduler() in schedule.c. What we
> > do is
> > go over all the vcpus of all domains of either the system or the
> > cpupool, and force the ones that we found with v->processor set to
> > the
> > pCPU that is going down, to perform migration (system_state will be
> > different than SYS_STATE_suspend, and we hence take the 'else'
> > branch).
> > 
> > Considering that the pCPU is no longer part of the relevant
> > bitmask-s
> > during the migration, the vCPUs will figure out that they just
> > can't
> > stay there, and move somewhere else.
>
> Thanks a lot for the elaboration, it is really helpful.
> 
NP :-)

> > Note, however, that this happens for running and runnable vCPUs. 
>
> I don't quite understand this, do you mean cpu_disable_scheduler()
> only handle running and runnable vCPUs, I tried to find some hints
> from the code, but I didn't get it. Could you please give some more
> information about this?
> 
It goes through all the vcpus of all domains, and does not check or
care whether they are running, runnable or blocked.

Let's look at this in some more details. So, let's assume that
processor 5 is going away, and that you have the following vcpus
around:

 d0v0 : v->processor = 5, running on cpu 5
 d0v1 : v->processor = 4, running on cpu 4
 d1v0 : v->processor = 5, runnable but not running
 d2v3 : v->processor = 5, blocked

for d0v0, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d0v0->pause_flags);
    vcpu_sleep_nosync(d0v0);
      SCHED_OP(sleep, d0v0);
        csched_vcpu_sleep(d0v0)
          cpu_raise_softirq(5, SCHEDULE_SOFTIRQ);
    vcpu_migrate(d0v0);
      if ( v->is_running || ...) // assume v->is_running is true
        return
    ...
    ... <--- scheduling occurs on processor 5
    ...
    context_saved(d0v0)
      vcpu_migrate(d0v0);
          //is_running is 0, so _VPF_migrating gets cleared
        vcpu_move_locked(d0v0, new_cpu);
        vcpu_wake(d0v0);
          SCHED_OP(wake, d0v0);
            csched_vcpu_wake(d0v0)
              __runq_insert(d0v0);
              __runq_tickle(d0v0);

for d0v1, we do:
  cpu_disable_scheduler(5)
    if ( d0v1->processor != 5 )
      continue

for d1v0, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d1v0->pause_flags);
    vcpu_sleep_nosync(d1v0);
      SCHED_OP(sleep, d1v0);
        csched_vcpu_sleep(d1v0)
          __runq_remove(d1v0);
    vcpu_migrate(d1v0);
      if ( d1v0->is_running ||
           !test_and_clear_bit(_VPF_migrating, d1v0->pause_flags)
          // false, but clears the _VPF_migrating flag
      vcpu_move_locked(d1v0, new_cpu);
      vcpu_wake(v);
        SCHED_OP(wake, d1v0);
          csched_vcpu_wake(d1v0)
            __runq_insert(d1v0);
            __runq_tickle(d1v0);

for d2v3, we do:
  cpu_disable_scheduler(5)
    set_bit(_VPF_migrating, d2v3-
>pause_flags);
    vcpu_sleep_nosync(d2v3);
      SCHED_OP(sleep, d2v3);
 
      csched_vcpu_sleep(d2v3)
[1]       // Nothing! 
   
vcpu_migrate(d2v3);
      if ( d2v3->is_running ||
         
 !test_and_clear_bit(_VPF_migrating, d2v3->pause_flags)
          //
false, but clears the _VPF_migrating flag
[*]   vcpu_move_locked(d2v3,
new_cpu);
      vcpu_wake(d2v3);
[2]     // Nothing!

> > If a
> > vCPU is blocker, there is nothing to do, and in fact, nothing
> > happens
> > (as vcpu_sleep_nosync() and vcpu_wake() are NOP in that case). 
>
> What do you mean by saying ' vcpu_sleep_nosync() and vcpu_wake()
> are NOP '?
> 
See [1] and [2] above.

> > For
> > those vCPUs, as soon as they wake up, they'll figure out that their
> > own
> > v->processor is not there any longer, and will move somewhere else.
>
> So basically, when vCPU is blocking, it has no impact to the blocking
> vcpu
> when 'v->processor' is removed. When the vCPU is waken up, it will
> find
> another pCPU to run, since the original 'v->processor' is down and no
> longer in the cpu bitmask, right?
> 
Yes, that was my point.

_However_, as you can see at [*] above, it must be noted that even
those vcpus that blocked while running on a certain processor (5 in the
example), indeed have a chance to have their
v->processor changed to something that is still online (something
different than 5), as a consequence of that processor going away.

Whether this is useful/enough for you, I can't tell right now, out of
the top of my head.

> > > > But this is not an issue  in non pCPU hotplug scenario.
> > > > 
> > It's probably an issue even if you remove a cpu from a cpupool (and
> > even a more "interesting" one, if you also manage to add it to
> > another
> > pool, in the meantime) isn't it?
>
> Yes, things become more complex in that case ....
> 
Well, but can you confirm that we also have an issue there, and test
and report what happens if you move a cpu from pool A to pool B, while
it still has vcpus from a domain that stays in pool A.

If there's transient suboptimal behavior, well, we probably can live
with that (depending on the specific characteristics of the transitory,
I'd say). If things crash, we certainly want a fix!

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-05-24 14:02 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-20  8:53 [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Feng Wu
2016-05-20  8:53 ` [PATCH 1/3] VMX: Properly adjuest the status of pi descriptor Feng Wu
2016-05-23  5:15   ` Tian, Kevin
2016-05-23  5:27     ` Wu, Feng
2016-05-23  6:52       ` Tian, Kevin
2016-05-23  7:16         ` Wu, Feng
2016-05-23  9:03           ` Jan Beulich
2016-05-23  9:21             ` Wu, Feng
2016-05-23 11:04               ` Jan Beulich
2016-05-23 12:30   ` Jan Beulich
2016-05-20  8:53 ` [PATCH 2/3] VMX: Make hook pi_do_resume always available Feng Wu
2016-05-23 12:32   ` Jan Beulich
2016-05-23 12:51     ` Dario Faggioli
2016-05-20  8:53 ` [PATCH 3/3] VMX: Remove the vcpu from the per-cpu blocking list after domain termination Feng Wu
2016-05-23  5:19   ` Tian, Kevin
2016-05-23  5:48     ` Wu, Feng
2016-05-23  6:54       ` Tian, Kevin
2016-05-23  9:08       ` Jan Beulich
2016-05-23  9:17         ` Wu, Feng
2016-05-23 10:35           ` Wu, Feng
2016-05-23 11:11             ` Jan Beulich
2016-05-23 12:24               ` Wu, Feng
2016-05-23 12:46                 ` Jan Beulich
2016-05-23 13:41                   ` Wu, Feng
2016-05-23 12:30   ` Dario Faggioli
2016-05-23 13:32     ` Wu, Feng
2016-05-23 14:45       ` Dario Faggioli
2016-05-23 12:35   ` Jan Beulich
2016-05-23 13:33     ` Wu, Feng
2016-05-20 10:27 ` [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Jan Beulich
2016-05-20 10:46   ` Wu, Feng
2016-05-23  8:08     ` Jan Beulich
2016-05-23  8:44       ` Wu, Feng
2016-05-23  8:51         ` Jan Beulich
2016-05-23 12:39           ` Dario Faggioli
2016-05-24 10:07             ` Wu, Feng
2016-05-24 13:33               ` Wu, Feng
2016-05-24 14:46                 ` Dario Faggioli
2016-05-25 13:28                   ` Wu, Feng
2016-05-24 14:02               ` Dario Faggioli [this message]
2016-05-25 12:39                 ` Wu, Feng
2016-06-23 12:33                 ` Wu, Feng
2016-06-23 15:11                   ` Dario Faggioli
2016-06-24  6:11                     ` Wu, Feng
2016-06-24  7:22                       ` Dario Faggioli
2016-06-24  7:59                         ` Wu, Feng
2016-06-24 10:27                           ` Dario Faggioli
2016-06-24 13:25                             ` Wu, Feng
2016-06-24 23:43                               ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464098547.21930.107.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=feng.wu@intel.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).